Density plots using ggplot2 [duplicate] - r

How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))

You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).

Related

Is there a way to plot a density function over a histogram that was plotted using the PlotRelativeFrequency() function in R

I have a vector of sample means and I've been tying to plot a probability histogram using hist(x) and ggplot but the bins exceed 1(which is very unusual for a probability distribution),I then used a PlotRelativeFrequency(hist(x)) function to force R to plot a histogram of probabilities,It worked! but My problem is,I cannot plot a density function over the histogram.When I used the lines(density(x)) function it plots a density function that goes way off the graph.
Since your question is tagged with ggplot, I'll give a ggplot answer.
To make histograms relative you have to set aes(y = stat(density)) such that it integrates to 1. Then, you could give the stat_function() the relevant density function for any theoretical distribution. The downside is that you'll have to pre-compute the parameters.
df <- data.frame(x = rnorm(500, 10, 2))
pars <- list(mean = mean(df$x), sd = sd(df$x))
library(ggplot2)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_function(fun = function(x) {dnorm(x, mean = pars$mean, sd = pars$sd)})
Next up, we can plot the empirical density using kernel density estimates, which does everything pretty much automatically:
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
geom_density()
Lastly, you can have a look at this stats function, that essentially automates the first version. Full disclaimer: I'm the author of that github repo.
library(ggnomics)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_theodensity()

ggplot axis: different y-axis on left and right of plot

I have the following data;
https://www.dropbox.com/s/at2f2zni7s1hnzm/results.csv?dl=0
When I plot all three plots using the following;
library(ggplot2)
library(pROC)
roc <- roc(results$testactual, results$pred)
ggroc(roc) +
geom_density(data = results %>%
filter(testactual == 0), aes(pred), color='green') +
geom_density(data = results %>%
filter(testactual == 1), aes(pred), color='black')
I am able to obtain 3 plots on the same graph but the axis are not as I would have hoped.
I am trying to make it such that the y-axis for the density plots are displayed onto the right side and the roc plots y-axis are on the left.
Finally sort the x-axis so that the 1 is on the right side and the zero is on the left side (however I think I can manage this as I have run into this problem before)
Direct R link to data:
results <- read.csv(url("https://www.dropbox.com/s/at2f2zni7s1hnzm/results.csv?dl=1"))
EDIT: Just plotting the density plots:
Plot of the ROC plot
Use the sec.axis parameter. Also you can use ..scaled..to scale your densities to max of 1.
roc <- roc(results$testactual, results$pred)
ggroc(roc) +
geom_density(data = results %>%
filter(testactual == 0), aes(x=pred, y=..scaled..), color='green') +
geom_density(data = results %>%
filter(testactual == 1), aes(x=pred, y=..scaled..), color='black') +
scale_y_continuous(name = "Density", sec.axis = sec_axis(~., name = "Sensitivity"))

How can i add two shade on both end of the density distribution plot

How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).

Filling under the a curve with ggplot graphs

I would like to create a graph with the normal function from x=-2 to x=2 filled under the curve from -2 to 0.
I've tried with ggplot2
qplot(c(-2, 2), stat="function", fun=dnorm, geom="line") +
+ geom_area(aes(xlim=c(-2,0)),stat="function", fun=dnorm)
But I get this graph completely filled instead (the black colour)
How can I get a plot filled only from -2 to 0?
Other options or packages are welcome.
I've also tried with only one command with ggplot and filled option but I can't get it either.
I know some people does it using polygons but the result is not so soft and nice.
PD: I repeat, the solution I'm looking for involves not generating x,y coordinates beforehand but using directly the function with stat="function", fun=dnorm or similar. Thus, my question is not a duplicate.
I've also tried
ggplot(NULL,aes(x=c(-2,2))) + geom_area(aes(x=c(-2,0)),stat="function", fun=dnorm, fill="red") +
geom_area(aes(x=c(0,2)),stat="function", fun=dnorm, fill="blue")
But again it fills all the curve with a single color, blue. The red half seems to be overwritten. The same with geom_ribbon and other options.
Try this:
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
stat_function(fun = dnorm) +
stat_function(fun = dnorm,
xlim = c(-2,0),
geom = "area")
Can't you generate your distribution data with dnorm instead?
library(ggplot2)
x<-seq(-2,2, 0.01)
y<-dnorm(x,0,1)
xddf <- data.frame(x=x,y=y)
qplot(x,y,data=xddf,geom="line")+
geom_ribbon(data=subset(xddf ,x>-2 & x<0),aes(ymax=y),ymin=0,
fill="red",colour=NA,alpha=0.5)+
scale_y_continuous(limits=c(0, .4))
These days, with after_stat() and after_scale(), you could also use
a more flexible approach that lets you explicitly map ranges of x values
to filled sections.
For example, filling some normal distribution quantiles:
library(ggplot2)
breaks <- qnorm(c(0, .05, .2, .5, .8, .95, 1))
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
scale_fill_brewer("x") +
stat_function(
n = 512,
fun = dnorm,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)
This approach also works with other statistics, e.g. stat_density() for kernel density estimates:
set.seed(42)
ggplot(data.frame(x = rnorm(1000)), aes(x)) +
scale_fill_brewer("x") +
stat_density(
n = 512,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)

ggplot2: how to create correct legend after using scale_xx_manual

I have a plot with three different lines. I want one of those lines to have points on as well. I also want the two lines without points to be thicker than the one without points. I have managed to get the plot I want, but I the legend isn't keeping up.
library(ggplot2)
y <- c(1:10, 2:11, 3:12)
x <- c(1:10, 1:10, 1:10)
testnames <- c(rep('mod1', 10), rep('mod2', 10), rep('meas', 10))
df <- data.frame(testnames, y, x)
ggplot(data=df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1)) +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0, 0))
I can remove the second (black) legend:
ggplot(data = df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1), guide='none') +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0.05, 0.05), guide='none')
But what I really want is a merge of the two legends - a legend with colours, cross only on the first variable (meas) and the lines of mod1 and mod2 thicker than the first line. I have tried guide and override, but with little luck.
You don't need transparency to hide the shapes for mod1 and mod2. You can omit these points from the plot and legend by setting their shape to NA in scale_shape_manual:
ggplot(data = df, aes(x = x, y = y, colour = testnames, size = testnames)) +
geom_line() +
geom_point(aes(shape = testnames), size = 5) +
scale_size_manual(values=c(0.5, 2, 2)) +
scale_shape_manual(values=c(8, NA, NA))
This gives the following plot:
NOTE: I used some more distinct values in the size-scale and another shape in order to better illustrate the effect.

Resources