adding percentile lines to a density plot [duplicate] - r

This question already has answers here:
Shading a kernel density plot between two points.
(5 answers)
Closed 10 years ago.
I have some data dt = data.table(x=c(1:200),y=rnorm(200)) and I start with a density plot using ggplot2:
plot = ggplot(dt,aes(y)) + geom_density(aes(y=..density..))
Is there a way I can add percentile lines similar to this?
If further I could shade the segments of the graph (created by the percentile lines) similar to this, then that would be great!

Here is a possibility heavily inspired by this answer :
dt <- data.table(x=c(1:200),y=rnorm(200))
dens <- density(dt$y)
df <- data.frame(x=dens$x, y=dens$y)
probs <- c(0.1, 0.25, 0.5, 0.75, 0.9)
quantiles <- quantile(dt$y, prob=probs)
df$quant <- factor(findInterval(df$x,quantiles))
ggplot(df, aes(x,y)) + geom_line() + geom_ribbon(aes(ymin=0, ymax=y, fill=quant)) + scale_x_continuous(breaks=quantiles) + scale_fill_brewer(guide="none")

myd = data.frame(xvar=rnorm(2000),yvar=rnorm(2000))
xd <- data.frame(density(myd$xvar)[c("x", "y")])
p <- ggplot(xd, aes(x, y)) +
geom_area(data = subset(xd, x < -1), fill = "pink") +
geom_area(data = subset(xd, x < -1.96), fill = "red") +
geom_area(data = subset(xd, x > 1), fill = "lightgreen") +
geom_area(data = subset(xd, x > 1.96), fill = "green") +
geom_line()
p

Related

How to embed the number of observations into violin plots?

I want to put data on facets of violin plots and annotate these violins with the number of observations used to plot the violin.
Here is an example of what I have without observation counts:
library(ggplot2)
library(dplyr)
library(tidyverse)
data("iris")
c <- rep(c('r', 'g', 'b'), 50)
c <- sample(c)
facet_row <- rep(c('row1', 'row2', 'row3', 'row4', 'row5'), 30)
facet_col <- rep(c('col1', 'col2', 'col3'), 50)
iris$facet_rows <- facet_row
iris$facet_cols <- facet_col
iris$color <- c
iris$count <- sample(1:10, size = 150, replace = T)
p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) +
geom_violin(alpha = 0.7, na.rm = T) +
coord_flip() +
facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))
print(p)
Result:
I want to put the number of observations right behind those violins.
I tried this so far:
count_data <- function (y){
df <- data.frame(y = min(y) - 0.2, label = length(y))
return(df)
}
p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) +
geom_violin(alpha = 0.7, na.rm = T) + stat_summary(fun.data = count_data, geom = "text", aes(group = Species)) +
coord_flip() +
facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))
print(p)
This produces an output with an issue:
Grouped violins now have one count value. The problem is that those violins most definetly will have different number of observations.
I have tried to just draw a geom_text using precomputed number of observations
(assume that iris$count actually contains observation counts that will have the same value for different rows, but random here):
p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) +
geom_violin(alpha = 0.7, na.rm = T) + geom_text(aes(label=count, y=Petal.Length), nudge_y = -0.1) +
coord_flip() +
facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))
print(p)
This has a similar problem with the previous approach:
It has values for two violins in the same group in one line.
Each violin repeats the number of observations once for each observation.
I am relatively new to R, I feel like there is a clean way to do this, but I can't figure it out...
Removing the explicit grouping and putting position_dodge resolved the issue:
count_data <- function (y){
df <- data.frame(y = min(y) - 0.2, label = length(y))
return(df)
}
p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) +
geom_violin(alpha = 0.7, na.rm = T) + stat_summary(fun.data = count_data, geom = "text", position = position_dodge(1)) +
coord_flip() +
facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))
print(p)

Pass changed geom from object to other ggplot

I first make a plot
df <- data.frame(x = c(1:40, rep(1:20, 3), 15:40))
p <- ggplot(df, aes(x=x, y = x)) +
stat_density2d(aes(fill='red',alpha=..level..),geom='polygon', show.legend = F)
Then I want to change the geom_density values and use these in another plot.
# build plot
q <- ggplot_build(p)
# Change density
dens <- q$data[[1]]
dens$y <- dens$y - dens$x
Build the other plot using the changed densities, something like this:
# Built another plot
ggplot(df, aes(x=x, y =1)) +
geom_point(alpha = 0.3) +
geom_density2d(dens)
This does not work however is there a way of doing this?
EDIT: doing it when there are multiple groups:
df <- data.frame(x = c(1:40, rep(1:20, 3), 15:40), group = c(rep('A',40), rep('B',60), rep('C',26)))
p <- ggplot(df, aes(x=x, y = x)) +
stat_density2d(aes(fill=group,alpha=..level..),geom='polygon', show.legend = F)
q <- ggplot_build(p)
dens <- q$data[[1]]
dens$y <- dens$y - dens$x
ggplot(df, aes(x=x, y =1)) +
geom_point(aes(col = group), alpha = 0.3) +
geom_polygon(data = dens, aes(x, y, fill = fill, group = piece, alpha = alpha)) +
scale_alpha_identity() +
guides(fill = F, alpha = F)
Results when applied to my own dataset
Although this is exactly what I'm looking for the fill colors seem not to correspond to the initial colors (linked to A, B and C):
Like this? It is possible to plot a transformation of the shapes plotted by geom_density. But that's not quite the same as manipulating the underlying density...
ggplot(df, aes(x=x, y =1)) +
geom_point(alpha = 0.3) +
geom_polygon(data = dens, aes(x, y, fill = fill, group = piece, alpha = alpha)) +
scale_alpha_identity() +
guides(fill = F, alpha = F)
Edit - OP now has multiple groups. We can plot those with the code below, which produces an artistic plot of questionably utility. It does what you propose, but I would suggest it would be more fruitful to transform the underlying data and summarize that, if you are looking for representative output.
ggplot(df, aes(x=x, y =1)) +
geom_point(aes(col = group), alpha = 0.3) +
geom_polygon(data = dens, aes(x, y, fill = group, group = piece, alpha = alpha)) +
scale_alpha_identity() +
guides(fill = F, alpha = F) +
theme_minimal()

Changing whisker length of multiple boxplot in R

I have a dataframe of 10 variables and I plotted it in two columns. But ggplot defines whiskers as 5th and 95th perecentile. I want whisker lengths as Q1 - 1.5*IQR / Q3 + 1.5*IQR for each of these plots and outliers as usual. A similar question has been posted in this link, but I couldn't make use of it. Any help will be appreciated!!
library(ggplot2)
library(tidyr)
df <- data.frame(matrix(rnorm(2000), ncol = 10))
plot.data <- gather(df, variable, value)
# plot.data$out <- as.numeric(rep(input_data, each = nrow(x_train)))
p <- ggplot(plot.data, aes(x = 0, y=value))
p <- p + geom_boxplot()
#p <- p + geom_point(aes(x = 0, y = test_data), color = "red")
p <- p + facet_wrap(~variable, scales = "free_x", strip.position = 'top', ncol = 2)
p <- p + coord_flip()
p <- p + xlab("") + ylab("")
p <- p + theme(legend.position="none") + theme_bw()
p <- p + theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
p
By default (notched=FALSE), the geom_boxplot() should give you the whisker you want (Q1 - 1.5*IQR / Q3 + 1.5*IQR). See a more current question link. Although, this is subjected to the quantile, IQR definition.
If you insist on setting them manually with stat_summary
# geom_boxplot parameters with stat summary
f <- function(x) {
r <- quantile(x, probs = c(0.25, 0.25, 0.5, 0.75, 0.75))
r[[1]]<-r[[1]]-1.5*IQR(x) #ymin lower whisker, as per geom_boxplot
r[[5]]<-r[[5]]+1.5*IQR(x) #ymax upper whisker
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
# To subset the outlying points for plotting,
o <- function(x) {
r <- quantile(x, probs = c(0.25, 0.75))
r[[1]]<-r[[1]]-1.5*IQR(x)
r[[2]]<-r[[2]]+1.5*IQR(x)
subset(x, x < r[[1]] | r[[2]] < x)
}
# added seed for consistency
set.seed(123)
df <- data.frame(matrix(rnorm(2000), ncol = 10))
plot.data <- gather(df, variable, value)
# plot.data$out <- as.numeric(rep(input_data, each = nrow(x_train)))
p <- ggplot(plot.data, aes(x = 0, y=value))
p <- p + stat_summary(fun.data = f, geom="boxplot")+
stat_summary(fun.y = o, geom="point")
#p <- p + geom_point(aes(x = 0, y = test_data), color = "red")
p <- p + facet_wrap(~variable, scales = "free_x", strip.position = 'top', ncol = 2)
p <- p + coord_flip()
p <- p + xlab("") + ylab("")
p <- p + theme(legend.position="none") + theme_bw()
p <- p + theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

How can I fill the space between values(geom_line) and an intercept with ggplot2? Different Colors for values over and under intercept [duplicate]

This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 3 months ago.
I want to do a graph with ggplot2, where I need the space/area between the intercept (=1) and the values (which I connected through geom_line) to be red (if the values are lower than 1) or green (if the values are bigger than 1). The data is from microsoft (price performance since 1999).
Data:
require(quantmod)
require(dplyr)
require(ggplot2)
getSymbols("MSFT", from ="1999-01-01")
microsoft <- data.frame(time(MSFT), MSFT[,6])
microsoft$time <- as.Date(microsoft$time.MSFT., "%Y-%m-%d")
microsoft <- microsoft %>%
mutate(change = MSFT.Adjusted - first(MSFT.Adjusted),
change.pc = change/first(MSFT.Adjusted)+1)
that is the ggplot I have so far:
ggplot(microsoft, aes(x = time, y = change.pc)) +
geom_line(stat = "identity") +
geom_hline(aes(yintercept=1), color="black") +
theme_bw() +
xlab("Jahr") + ylab("") +
ggtitle("Microsoft Kursentwicklung seit Januar 1999")
I want to fill the space between y = 1 and the values above in green, and the space between y = 1 and the values under in red. I tried geom_ribbon, geom_area, geom_polynom, but nothing worked. The biggest problem is, that it fills the space green, but not online above y = 1 but also under. and the red you can't even see...
here what I tried:
geom_area(data = subset(microsoft, change.pc > 1), fill = "green", alpha =0.5)
geom_area(data = subset(microsoft, change.pc < 1), fill = "red", alpha = 0.5)
I put these to lines in my plot, and then the problem I described above appeared.
Among other things I also tried this (found here on stackoverflow.com):
microsoft$grp <- "orig"
microsoft <- microsoft[order(microsoft$time),]
microsoft_new <- do.call("rbind",
sapply(1:(nrow(microsoft) -1), function(i){
f <- lm(time ~ change.pc, microsoft[i:(i+1), ])
if (f$qr$rank < 2) return(NULL)
r <- predict(f, newdata = data.frame(change.pc = 0))
if(microsoft[i, ]$time < r & r < microsoft[i+1, ]$time)
return(data.frame(time = r, change.pc = 0))
else return(NULL)
})
)
microsoft_2 <- rbind(microsoft, microsoft_new)
ggplot(microsoft_2, aes(x = time, y = change.pc)) +
geom_area(data = subset(microsoft_2, change.pc <= 1), fill = "red") +
geom_area(data = subset(microsoft_2, change.pc >= 1), fill = "blue") +
scale_x_continuous("", expand = c(0,0), breaks = seq(1999, 2017, 3)) +
theme_bw()
That didn't work either.
Does anyone has an idea how I could achieve what I need?
This is how it should look
I couldn't get your data to work, but using some made up data, the following approach looks like your example:
library(ggplot2)
set.seed(0)
microsoft <- data.frame(date=1:1000, y=cumsum(runif(1000)-0.5))
ggplot(microsoft, aes(x=date,y=y)) +
geom_ribbon(aes(ymin=pmin(microsoft$y,0), ymax=0), fill="red", col="red", alpha=0.5) +
geom_ribbon(aes(ymin=0, ymax=pmax(microsoft$y,0)), fill="green", col="green", alpha=0.5) +
geom_line(aes(y=0))
I found a very clean solution using ggh4x package. Here it is
library(ggh4x)
set.seed(0)
microsoft <- data.frame(date=1:1000, y=cumsum(runif(1000)-0.5))
ggplot(microsoft, aes(x=date,y=y)) +
ggh4x::stat_difference(aes(ymin = 0, ymax = y)) +
geom_line(aes(y = y)) +
labs(fill = NULL)+
theme_bw()
You can use geom_ribbon for this. The following solution is similar to #Miff's solution, but with intersection at 1. I have in addition added the desired scales.
ggplot(microsoft, aes(x = time, y = change.pc)) +
geom_ribbon(aes(ymin=pmin(change.pc,1), ymax=1), fill="red", col="red", alpha=0.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(microsoft$change.pc,1)), fill="green", col="green", alpha=0.5) +
geom_hline(aes(yintercept=1), color="black") +
theme_bw(base_size = 16) +
scale_x_date(name = "Jahr",
date_breaks = "3 years",
date_minor_breaks = "1 year",
date_labels = "%Y") +
scale_y_continuous(name = "",
breaks = seq(.8, 2.8, by = .4),
labels = paste0(seq(80, 280, by = 40), "%")) +
ggtitle("Microsoft Kursentwicklung seit Januar 1999")

Add a specific value of x-axis on ggplot

I am using the ggplot function to plot this kind of graph
image
I want to add the specific value of the x-axis as shown in the picture
this is my code :
quantiles <- quantile(mat,prob = quant)
x <- as.vector(mat)
d <- as.data.frame(x=x)
p <- ggplot(data = d,aes(x=x)) + theme_bw() +
geom_histogram(aes(y = ..density..), binwidth=0.001,color="black",fill="white") +
geom_density(aes(x=x, y = ..density..),fill="blue", alpha=0.5, color = 'black')
x.dens <- density(x)
df.dens <- data.frame(x = x.dens$x, y = x.dens$y)
p <- p + geom_area(data = subset(df.dens, x <= quantiles), aes(x=x,y=y),
fill = 'green', alpha=0.6)
print(p)

Resources