I am trying to add an additional line to my geom_freqpoly plot that represents the average count per binwidth. I tried two different things but none of them were successful.
I tried adding the line as a geom_line but got an error asking if I map my stat in the wrong layer.
library(tidyverse)
iris %>%
ggplot(aes(x = Petal.Length, y = ..count..)) +
geom_freqpoly(aes(color = Species),
binwidth = 0.2) +
geom_line(aes(yintercept = "mean"))
#> Warning: Ignoring unknown aesthetics: yintercept
#> Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count...
#> Did you map your stat in the wrong layer?
I tried adding another geom_freqpoly, like:
library(tidyverse)
iris %>%
ggplot() +
geom_freqpoly(aes(x = Petal.Length, y = ..count.., color = Species),
binwidth = 0.2) +
geom_freqpoly(aes(x = Petal.Length, y = mean(..count..), color = "red"), binwidth = 0.2)
But the resulting line is not what I expect.
Using the Iris dataset I would expect that the new line would represent the average count of Species by the defined binwidth (see image below), not what I am getting. My understanding is that geom_freqpoly divides a continues variable (like Petal.Length) in length bins (of 0.2 length in this case). So for each bin I want to have the average count of each specie and draw a line connecting those points.
Created on 2020-05-23 by the reprex package (v0.3.0)
Based on the edit of your question maybe this is what you expected.
The problem with your approach is that mean(..count..) simply computes the mean of the vector ..count.. which gives you one number and therefore a horizontal line. Therefore simply divide by the number of groups or Species
I'm not completely satisfied with my solution because I would like to avoid the code n_distinct(iris$Species). I tried some approaches with tapply but failed. So as a first step ...
library(tidyverse)
ggplot(iris, aes(x = Petal.Length)) +
geom_freqpoly(aes(color = Species), binwidth = .2) +
geom_freqpoly(aes(y = ..count.. / n_distinct(iris$Species)), color = "red", binwidth = .2)
Created on 2020-05-28 by the reprex package (v0.3.0)
Related
Basically I am using a variable on my dataset to alter the size of the data points of my plot. When I use this variable R automatically uses 3 division: 100,000 200,000 300,000. Nevertheless the majority of my data has a variable less than 100,000 and the row that has 300,000 as a value, makes jumps between size too large making the graph not show accurate information.
Is there a way to define the number of divisions of size. What I have tried is changing the scale with the following code:
scale_size_continuous(range = c(5,10))
This just modifies the size of all dots, I do not want to change the scale, but I want to do is introduce more sizes. Is there any way I can manually modify this parameter.
If you want more 'points' you need more breaks, e.g.
library(tidyverse)
df <- mtcars
df %>%
ggplot(aes(x = cyl, y = disp, size = disp)) +
geom_point() +
scale_size_continuous(range = c(1,10))
df %>%
ggplot(aes(x = cyl, y = disp, size = disp)) +
geom_point() +
scale_size_continuous(range = c(1,10),
breaks = seq(50, 500, 50))
Created on 2022-10-28 by the reprex package (v2.0.1)
Does that solve your problem?
I would like to plot a background that captures the density of points in one dimension in a scatter plot. This would serve a similar purpose to a marginal density plot or a rug plot. I have a way of doing it that is not particularly elegant, I am wondering if there's some built-in functionality I can use to produce this kind of plot.
Mainly there are a few issues with the current approach:
Alpha overlap at boundaries causes banding at lower resolution as seen here. - Primary objective, looking for a geom or other solution that draws a nice continuous band filled with specific colour. Something like geom_density_2d() but with the stat drawn from only the X axis.
"Background" does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins. - Not a big deal, is nice-to-have but not required.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves. - This may not be easily achievable, independent palettes for layers appears to be a fundamental issue with ggplot2.
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
)%>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_rect(
data = dns_df,
aes(xmin = start, xmax = end, fill = density),
ymin = min(iris$Sepal.Width),
ymax = max(iris$Sepal.Width),
alpha = 0.5) +
scale_fill_viridis_c(option = "A") +
geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_rug(data = iris, aes(x = Sepal.Length))
This is a bit of a hacky solution because it (ab)uses knowledge of how objects are internally parametrised to get what you want, which will yield some warnings, but gets you want you'd want.
First, we'll use a geom_raster() + stat_density() decorated with some choice after_stat()/stage() delayed evaluation. Normally, this would result in a height = 1 strip, but by setting the internal parameters ymin/ymax to infinitives, we'll have the strip extend the whole height of the plot. Using geom_raster() resolves the alpha issue you were having.
library(ggplot2)
p <- ggplot(iris) +
geom_raster(
aes(Sepal.Length,
y = mean(Sepal.Width),
fill = after_stat(density),
ymin = stage(NULL, after_scale = -Inf),
ymax = stage(NULL, after_scale = Inf)),
stat = "density", alpha = 0.5
)
#> Warning: Ignoring unknown aesthetics: ymin, ymax
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Next, we add a fill scale, and immediately follow that by ggnewscale::new_scale_fill(). This allows another layer to use a second fill scale, as demonstrated with fill = Species.
p <- p +
scale_fill_viridis_c(option = "A") +
ggnewscale::new_scale_fill() +
geom_point(aes(Sepal.Length, Sepal.Width, fill = Species),
shape = 21) +
geom_rug(aes(Sepal.Length))
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Lastly, to get rid of the padding at the x-axis, we can manually extend the limits and then shrink in the expansion. It allows for an extended range over which the density can be estimated, making the raster fill the whole area. There is some mismatch between how ggplot2 and scales::expand_range() are parameterised, so the exact values are a bit of trial and error.
p +
scale_x_continuous(
limits = ~ scales::expand_range(.x, mul = 0.05),
expand = c(0, -0.2)
)
#> Warning: Duplicated aesthetics after name standardisation: NA
Created on 2022-07-04 by the reprex package (v2.0.1)
This doesn't solve your problem (I'm not sure I understand all the issues correctly), but perhaps it will help:
Background does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins.
If you make the 'background' larger and use coord_cartesian() you can get the same 'filled-to-the-edges' effect; would this work for your use-case?
Alpha overlap at boundaries causes banding at lower resolution as seen here.
I wasn't able to fix the banding completely, but my approach below appears to reduce it.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves.
If you use geom_segment() you can map density to colour, leaving fill available for e.g. the points. Again, not sure if this is a useable solution, just an idea that might help.
library(tidyverse)
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
) %>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_segment(
data = dns_df,
aes(x = start, xend = end,
y = min(iris$Sepal.Width) * 0.9,
yend = max(iris$Sepal.Width) * 1.1,
color = density), alpha = 0.5) +
coord_cartesian(ylim = c(min(iris$Sepal.Width),
max(iris$Sepal.Width)),
xlim = c(min(iris$Sepal.Length),
max(iris$Sepal.Length))) +
scale_color_viridis_c(option = "A", alpha = 0.5) +
scale_fill_viridis_d() +
geom_point(data = iris, aes(x = Sepal.Length,
y = Sepal.Width,
fill = Species),
shape = 21) +
geom_rug(data = iris, aes(x = Sepal.Length))
Created on 2022-07-04 by the reprex package (v2.0.1)
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
This question already has answers here:
using stat_function and facet_wrap together in ggplot2 in R
(6 answers)
Closed 2 years ago.
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas?
Thanks!
Edit
I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))
I am trying to reproduce this plot with ggplot2 :
From what I understood you can call it a histogram with linear color gradient
I stuck on this linear color gradient, I can't figure how to reproduce it within each columns.
I found one work around on another post here :Trying to apply color gradient on histogram in ggplot
But it is quite an old one and does not look well with my data, also it is more a "categorical coloring" than a "gradient coloring".
I found also this one : Plot background colour in gradient but it only applies the gradient on the plot background and not in the columns.
This could be tested using the iris dataset :
ggplot(iris, aes(x=Species, fill=Petal.Width)) +
geom_histogram(stat = "count")
Where the Petal.Width values of each Species would be used as a coloring gradient for each columns of the histogram with a color legend as in the example plot.
Any help is welcome !
As the data is not provided, I use a toy example.
The point is to have two variables one for colouring (grad) and another for the x-axis (x in the example). You need to use desc() to make the higher values placed on the higher position in each bin.
library(tidyverse)
n <- 10000
grad <- runif(n, min = 0, max = 100) %>% round()
x <- sample(letters, size = n, replace = T)
tibble(x, grad) %>%
ggplot(aes(x = x, group = desc(grad), fill = grad)) +
geom_bar(stat = 'count') +
scale_fill_viridis_c()
Created on 2020-05-14 by the reprex package (v0.3.0)
Or, using iris, the example is like:
library(tidyverse)
ggplot(iris, aes(x=Species, group = desc(Petal.Width), fill=Petal.Width)) +
geom_histogram(stat = "count") +
scale_fill_viridis_c()
#> Warning: Ignoring unknown parameters: binwidth, bins, pad
Created on 2020-05-14 by the reprex package (v0.3.0)