Why do geom_density and stat_density(geom = "line") give different results?

Why do geom_density and stat_density(geom = "line") give different results? - r

In the following illustration, why do geom_density and stat_density(geom = "line") give different results?
library(ggplot2)
df <- data.frame(
x.values = c(
rnorm(100, mean = 1, sd = 1),
rnorm(100, mean = 4, sd = 1),
rnorm(100, mean = 7, sd = 1),
rnorm(100, mean = 10, sd = 1)
),
mean.values = sort(rep(c(1, 4, 7, 10), 100))
)
p <- ggplot(df, aes(x = x.values, color = mean.values, group = mean.values))
p + geom_density()
p + stat_density(geom = "line")

It's a difference in the position argument. The default in stat_density is position = "stack", whilst with geom_density() it is position = "identity".
If you call p + stat_density(geom = "line", position = "identity") you get the same as geom_density():

Related

How can I add a mean bar and a jitter to my dot plots?

I am trying to compare data from three groups and I would like to have a mean bar on every group and some jitter.
first <- c(1, 1.2, 2, 3, 4)
second <- c(5, 6, 7, 8, 9)
third <- c(10, 16, 17, 18, 19)
df <- data.frame(Value = c(first,second),
Cat = c(rep("first",length(first)), rep("second",length(second))),
xseq = c(seq_along(first),seq_along(second)))
library(ggplot2)
ggplot(df, aes(x = Cat, y = Value, color = Cat)) + geom_point()+xlab("")
df <- data.frame(Value = c(first,second, third),
Cat = c(rep("first",length(first)),
rep("second",length(second)),
rep("third",length(third))),
xseq = c(seq_along(first),
seq_along(second),
seq_along(third)))
library(ggplot2)
ggplot(df, aes(x = Cat, y = Value, color = Cat)) + geom_point()+xlab("")

Something like this?
library(ggplot2)
ggplot(df, aes(x = Cat, y = Value, color = Cat)) +
geom_errorbar(stat = "summary", width = 0.1, color = "black", alpha = 0.5) +
stat_summary(geom = "point", fun = mean, color = "black") +
geom_point(position = position_jitter(width = 0.1), shape = 18, size = 4) +
scale_color_brewer(palette = "Set2") +
theme_light(base_size = 16)

How to predefine legend colours based on value range using ggplot2 and RColorBrewer?

I have some data from a range of tests that I'm calculating STEN scores for. I'm aiming to visualise this data in the form of a circular bar plot and would like to set the colour gradient based on a STEN score range. For example, a score of 0-2 would be a very light colour, 2.1-4 light, 4.1-6 moderate, 6.1-8 dark and 8.1-10 very dark. My code below uses the RColorBrewer package and the "YlGn" palette, but I'm stuck on how I can predefine the colour scheme based on the example mentioned above and set this in the plot legend. The example below produces a circular bar plot containing a lowest STEN score of 4.8, so I would like this to be reflected as the moderate colour, where currently its the lightest. I essentially want the legend to show all five STEN score ranges irrespective of whether someone's data scores within each range. Hope this makes sense.
library(tidyverse)
library(RColorBrewer)
set.seed(50)
dat <- data.frame(
par = paste("par", 1:15),
test_1 = round(rnorm(15, mean = 30, sd = 5), 1),
test_2 = round(rnorm(15, mean = 30, sd = 5), 1),
test_3 = round_any(rnorm(15, mean = 90, sd = 5), 2.5),
test_4 = round(rnorm(15, mean = 5.4, sd = 0.3), 1),
test_5 = round(rnorm(15, mean = 17, sd = 1.5), 1)
)
sten_dat <- dat %>%
mutate_if(is.numeric, scale) %>%
mutate(across(c(2:6), ~ . * 2 + 5.5)) %>%
mutate(across(where(is.numeric), round, 1)) %>%
pivot_longer(!par, names_to = "test", values_to = "sten") %>%
filter(par == "par 1")
ggplot(sten_dat) +
geom_col(aes(x = str_wrap(test), y = sten, fill = sten),
position = "dodge2", alpha = 0.7, show.legend = TRUE) +
coord_polar() +
scale_y_continuous(limits = c(-1, 11), breaks = seq(0, 10, 2)) +
scale_fill_gradientn(colours = brewer.pal(name = "YlGn", n = 5))`

Simply add limits to your fill scale:
ggplot(sten_dat) +
geom_col(aes(x = str_wrap(test), y = sten, fill = sten),
position = "dodge2", alpha = 0.7, show.legend = TRUE) +
coord_polar() +
scale_y_continuous(limits = c(-1, 11), breaks = seq(0, 10, 2)) +
scale_fill_gradientn(colours = brewer.pal(name = "YlGn", n = 5),
limits = c(0, 10))
If you want the colors to be clearly "binned" in the way you describe, you can use scale_fill_stepn instead of scale_fill_gradientn
ggplot(sten_dat) +
geom_col(aes(x = str_wrap(test), y = sten, fill = sten),
position = "dodge2", alpha = 0.7, show.legend = TRUE) +
scale_y_continuous(limits = c(-1, 11), breaks = seq(0, 10, 2)) +
scale_fill_stepsn(colours = brewer.pal(name = "YlGn", n = 5),
limits = c(0, 10), breaks = 0:5 * 2) +
geomtextpath::coord_curvedpolar() +
theme_minimal() +
theme(axis.text.x = element_text(size = 16, face = 2),
panel.grid.major.x = element_blank())

Remove points with 0 density (no data) in stat_density_2d(geom = 'point')

I have two dataframes, one which I want to make a stat_density_2d plot using a 'raster' geom and one in which I want to use a 'point' geom. For the point geom I want to remove any point where there is no data though, as measured by a point size of 0.
The following is my code:
library(tidyverse)
set.seed(1)
#tibble for raster density plot
df <- tibble(x = runif(1000000, min = -7, max = 5),
y = runif(1000000, min = 0, max = 1000))
#tibble for point density plot
df2 <- tibble(x = runif(20000, min = -2, max = 2),
y = runif(20000, min = 0, max = 500))
#create the density plot
p1 <- ggplot(NULL, aes(x=x, y=y) ) +
stat_density_2d(data = df, aes(fill = stat(density)), geom = "raster", contour = FALSE) +
scale_fill_gradient(low="transparent", high="red") +
stat_density_2d(data = df2, geom = "point", aes(size = ..density..), n = 40, contour = FALSE) +
theme_bw() +
theme(text=element_text(size=18)) +
ylim(0, 1000) + xlim(-7, 5)
p1
which returns:
But where the points are smallest (outside the bounds specified in the df2 tibble) I don't want any density points to be shown. Is there anyway to remove these?

Here's a hack, though I don't know how robust it is to differences in data.
BLUF: add scale_radius(range=c(-1,6)).
I reduced your data a lot so that it doesn't take 5 minutes to render.
set.seed(1)
df <- tibble(x = runif(1000, min = -7, max = 5),
y = runif(1000, min = 0, max = 1000))
df2 <- tibble(x = runif(20, min = -2, max = 2),
y = runif(20, min = 0, max = 500))
Four plots:
Your code (my data), no other change;
scale_radius();
scale_radius(range = c(-0.332088004, 6)); and
scale_radius(range = c(-1, 6)).
This is surely a hack, and I don't know how to find a more precise way of filtering out specific levels.
The modified code:
p1 <- ggplot(NULL, aes(x=x, y=y) ) +
stat_density_2d(data = df, aes(fill = stat(density)), geom = "raster", contour = FALSE) +
scale_fill_gradient(low="transparent", high="red") +
stat_density_2d(data = df2, geom = "point", aes(size = ..density..), n = 40, contour = FALSE) +
theme_bw() +
# scale_radius() +
# scale_radius(range = c(-0.332088004, 6)) +
scale_radius(range = c(-1, 6)) +
theme(text=element_text(size=18)) +
ylim(0, 1000) + xlim(-7, 5)

shade block between two lines, values vary with facet_wrap

I'm plotting the relationships between speed and time for four different species (each in a different facet). For each species, I have a range of speeds I'm interested in, and would like to shade the area between the min and max values. However, these ranges are different for the 4th species compared to the first three.
#data to plot as points
species <- sample(letters[1:4], 40, replace = TRUE)
time <- runif(40, min = 1, max = 100)
speed <- runif(40, min = 1, max = 20)
df <- data.frame(species, time, speed)
#ranges of key speeds
sp <- letters[1:4]
minspeed <- c(5, 5, 5, 8)
maxspeed <- c(10, 10, 10, 13)
df.range <- data.frame(sp, minspeed, maxspeed)
ggplot() +
geom_hline(data = df.range, aes(yintercept = minspeed),
colour = "red") +
geom_hline(data = df.range, aes(yintercept = maxspeed),
colour = "red") +
geom_point(data=df, aes(time, speed),
shape = 1) +
facet_wrap(~species) +
theme_bw()
How do I:
get geom_hline to only plot the max and min ranges for the correct species, and
shade the area between the two lines?
For the later part, I've tried adding geom_ribbon to my plot, but I keep getting an error message that I'm unsure how to address.
geom_ribbon(data = df,
aes(ymin = minspeed, ymax = maxspeed,
x = c(0.0001, 100)),
fill = "grey",
alpha = 0.5) +
Error: Aesthetics must be either length 1 or the same as the data
(40): x, ymin, ymax

As per my comment, the following should work. Perhaps there are other unobserved differences between your actual use case & the example in your question?
colnames(df.range)[which(colnames(df.range) == "sp")] <- "species"
ggplot() +
geom_hline(data = df.range, aes(yintercept = minspeed),
colour = "red") +
geom_hline(data = df.range, aes(yintercept = maxspeed),
colour = "red") +
geom_point(data = df, aes(time, speed),
shape = 1) +
geom_rect(data = df.range,
aes(xmin = -Inf, xmax = Inf, ymin = minspeed, ymax = maxspeed),
fill = "grey", alpha = 0.5) +
facet_wrap(~species) +
theme_bw()
Data used:
df <- data.frame(species = sample(letters[1:4], 40, replace = TRUE),
time = runif(40, min = 1, max = 100),
speed = runif(40, min = 1, max = 20))
df.range <- data.frame(sp = letters[1:4],
minspeed = c(5, 5, 5, 8),
maxspeed = c(10, 10, 10, 13))

Multiple histograms with non-integer frequencies in R using ggplot

I'm trying to find a way to plot multiple histograms of non-integer frequencies in R. For example:
a = c(1,2,3,4,5)
a_freq = c(1.5, 2.5, 3.5, 4.5, 5.5)
b = c(2, 4, 6, 8, 10)
b_freq = c(2.5, 5, 6, 7, 8)
using something like
qplot(x = a, weight = a_freq, geom = "histogram")
works, but how do I superimpose b (and b_freq) onto this? any ideas?
This is what we would do if the frequencies are integer values:
require(ggplot2)
require(reshape2)
set.seed(1)
df <- data.frame(x = rnorm(n = 1000, mean = 5, sd = 2), y = rnorm(n = 1000, mean = 2), z = rnorm(n = 1000, mean = 10))
ggplot(melt(df), aes(value, fill = variable)) + geom_histogram(position = "dodge")
Something similar, when we have non_integer values?
Thanks,
Karan

I'm still not entirely sure what you're trying to do, so here are four options:
library(ggplot2)
a = c(1,2,3,4,5)
a_freq = c(1.5, 2.5, 3.5, 4.5, 5.5)
b = c(2, 4, 6, 8, 10)
b_freq = c(2.5, 5, 6, 7, 8)
dat <- data.frame(x = c(a,b),
freq = c(a_freq,b_freq),
grp = rep(letters[1:2],each = 5))
ggplot(dat,aes(x = x,weight = freq,fill = grp)) +
geom_histogram(position = "dodge")
ggplot(dat,aes(x = x,y = freq,fill = grp)) +
geom_bar(position = "dodge",stat = "identity",width = 0.5)
ggplot(dat,aes(x = x,y = freq,fill = grp)) +
facet_wrap(~grp) +
geom_bar(stat = "identity",width = 0.5)
ggplot() +
geom_bar(data = dat[dat$grp == 'a',],aes(x = x,y = freq),
fill = "blue",
alpha = 0.5,
stat = "identity",
width = 0.5) +
geom_bar(data = dat[dat$grp == 'b',],aes(x = x,y = freq),
fill = "red",
alpha = 0.5,
stat = "identity",
width = 0.5)
If you have a discrete x values and precomputed "heights" that is not a histogram, that is a bar plot, so I would opt for one of those.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why do geom_density and stat_density(geom = "line") give different results? - r

It's a difference in the position argument. The default in stat_density is position = "stack", whilst with geom_density() it is position = "identity". If you call p + stat_density(geom = "line", position = "identity") you get the same as geom_density():

Related

How can I add a mean bar and a jitter to my dot plots?

How to predefine legend colours based on value range using ggplot2 and RColorBrewer?

Remove points with 0 density (no data) in stat_density_2d(geom = 'point')

shade block between two lines, values vary with facet_wrap

Multiple histograms with non-integer frequencies in R using ggplot

Categories

Resources