I have some dummy data and am able to create a bar chart and a stacked bar chart:
# some data
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(100, 110, 120),
M = c(10, 15, 20)
)
Looks like this:
egdf
ch N M
1 a 100 10
2 b 110 15
3 c 120 20
Now some charts:
# bar chart
ggplot(egdf, aes(x = ch, y = N)) +
geom_bar(stat = 'identity')
# stacked bar chart
egdf %>%
pivot_longer(cols = c(N, M), names_to = 'metric') %>%
ggplot(aes(x = ch, y = value, fill = metric)) +
geom_bar(stat = 'Identity')
My question is, is there a way to create the stacked bar chart from egdf directly without having to first transform with pivot_longer()?
[EDIT]
Why am I asking for this? My actual dataframe has some additional fields which are based on calculations off the current structure, e.g. it looks more like this:
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(120, 110, 100),
M = c(10, 15, 20)
) %>%
mutate(drop = N - lag(N),
drop_pct = scales::percent(drop / N),
Rate = scales::percent(M / N))
egdf
ch N M drop drop_pct Rate
1 a 120 10 NA <NA> 8.3%
2 b 110 15 -10 -9.09% 13.6%
3 c 100 20 -10 -10.00% 20.0%
In my plot, I'm adding on some additional geoms. If I was to pivot_longer, these relationships would be buckled. If I was able to somehow tell ggplot to make a stacked bar just based on feature1, feature2 (N and M in the example) it would be much easier for this particular use case.
Update: See valuable comment of stefan:
ggplot(egdf1, aes(x=ch, y=N+M)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M")) +
ylab("N") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
First answer:
Are you looking for such a solution?
ggplot(egdf1, aes(x=ch, y=N)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M"))
Related
Suppose I wish to make a range plot with the design below using ggplot with the following dummy data:
with following legend.
set.seed(1)
test.dat <- data.frame(
yval = sample(1:100, 40),
xcat = rep(LETTERS[1:4], 10),
base = sample(c(1, 0),40, replace=T),
col = rep(c("red", "blue"), 40)
)
> head(test.dat)
yval xcat base col
1 68 A 0 red
2 39 B 0 blue
3 1 C 0 red
4 34 D 1 blue
5 87 A 0 red
6 43 B 0 blue
The gray portion shows the range of the data where base == 1 and the whisker-like line (that resembles errorbar) shows the range of the data where base == 0 using the respective color designed for each xcat.
So using this dummy data, I would expect:
minmax <- function(x){
return(
c(min(x),max(x))
)
}
> minmax(test.dat[test.dat$xcat == "D" & test.dat$base == 1,]$yval)
[1] 24 99
> minmax(test.dat[test.dat$xcat == "D" & test.dat$base == 0,]$yval)
[1] 21 82
> unique(test.dat[test.dat$xcat == "D",]$col)
[1] "blue"
for xcat == "D", a gray bar to range from 24 to 99, and a blue whisker line to range from 21 to 82.
How can I achieve this? It looks like there is no straightforward ggplot function to create a range plot.
My approach idea was to adjust geom_boxplot's hinges and whisper definition for gray part, and use geom_line or geom_linerange to create the whisker-line part, but I am unsure how to do that.
Thank you.
You first create a dataframe where you have min and max for each combination of (xcat, base and col)
data2 <- test.dat %>% group_by(xcat, base, col) %>% summarise(min = min(yval), max=max(yval))
Then you use geom_linerange for the gray "bars" and geom_errorbar for the whisker line:
ggplot()+
geom_linerange(data= data2 %>% filter(base==1), aes(x= xcat, ymin=min, ymax=max), size=12, alpha=0.5)+
geom_errorbar(data= data2 %>% filter(base==0), aes(x= xcat, ymin=min, ymax=max), colour=data2[data2$base==1,]$col, width=.2)
And this is the
Plot
I would suggest doing some reshaping first using dplyr/tidyr, and then geom_tile:
library(tidyverse)
test.dat %>%
group_by(xcat, base, col) %>%
summarize(mid = mean(range(yval)),
range = diff(range(yval)), .groups = "drop") %>%
pivot_wider(names_from = base, values_from = mid:range) %>%
ggplot(aes(x = xcat)) +
geom_tile(aes(y = mid_0, height = range_0), fill = "gray70", color = "black") +
geom_tile(aes(y = mid_1, height = range_1, fill = col), color = "black") +
scale_fill_identity()
I've got a dataset similar to this:
x <- 100 - abs(rnorm(1e6, 0, 5))
y <- 50 + rnorm(1e6, 0, 3)
dist <- sqrt((x - 100)^2 + (y - 50)^2)
z <- exp(-(dist / 8)^2)
which can be visualised as follows:
data.frame(x, y, z) %>%
ggplot() + geom_point(aes(x, y, color = z))
What I would like to do is a stacked half-circle plot with averaged value of z in subsequent layers. I think it can be done with the combination of geom_col and coord_polar(), although the farthest I can get is
data.frame(x, y, z, dist) %>%
mutate(dist_fct = cut(dist, seq(0, max(dist), by = 5))) %>%
ggplot() + geom_bar(aes(x = 1, y = 1, fill = dist_fct), stat = 'identity', position = 'fill') +
coord_polar()
which is obviously far from the expectation (layers should be of equal size, plot should be clipped on the right half).
The problem is that I can't really use coord_polar() due to further use of annotate_custom(). So my question are:
can plot like this can be done without coord_polar()?
If not, how can it be done with coord_polar()?
The result should be similar to a graphic below, except from plotting layers constructed from points I would like to plot only layers as a whole with color defined as an average value of z inside a layer.
If you want simple radius bands, perhaps something like this would work as you pictured it in your question:
# your original sample data
x <- 100 - abs(rnorm(1e6, 0, 5))
y <- 50 + rnorm(1e6, 0, 3)
dist <- sqrt((x - 100)^2 + (y - 50)^2)
nbr_bands <- 6 # set nbr of bands to plot
# calculate width of bands
band_width <- max(dist)/(nbr_bands-1)
# dist div band_width yields an integer 0 to nbr bands
# as.factor makes it categorical, which is what you want for the plot
band = as.factor(dist %/% (band_width))
library(dplyr)
library(ggplot2)
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed() +
theme_dark() # dark theme
Edit to elaborate:
As you first attempted, it would be nice to use the very handy cut() function to calculate the radius color categories.
One way to get categorical (discrete) colors, rather than continuous shading, for your plot color groups is to set your aes color= to a factor column.
To directly get a factor from cut() you may use option ordered_result=TRUE:
band <- cut(dist, nbr_bands, ordered_result=TRUE, labels=1:nbr_bands) # also use `labels=` to specify your own labels
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed()
Or more simply you may use cut() without options and convert to a factor using as.factor():
band <- as.factor( cut(dist, nbr_bands, labels=FALSE) )
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed()
Sounds like you may find the circle & arc plotting functions from the ggforce package useful:
# data
set.seed(1234)
df <- data.frame(x = 100 - abs(rnorm(1e6, 0, 5)),
y = 50 + rnorm(1e6, 0, 3)) %>%
mutate(dist = sqrt((x - 100)^2 + (y - 50)^2)) %>%
mutate(z = exp(-(dist / 8)^2))
# define cut-off values
cutoff.values <- seq(0, ceiling(max(df$dist)), by = 5)
df %>%
# calculate the mean z for each distance band
mutate(dist_fct = cut(dist, cutoff.values)) %>%
group_by(dist_fct) %>%
summarise(z = mean(z)) %>%
ungroup() %>%
# add the cutoff values to the dataframe for inner & outer radius
arrange(dist_fct) %>%
mutate(r0 = cutoff.values[-length(cutoff.values)],
r = cutoff.values[-1]) %>%
# add coordinates for circle centre
mutate(x = 100, y = 50) %>%
# plot
ggplot(aes(x0 = x, y0 = y,
r0 = r0, r = r,
fill = z)) +
geom_arc_bar(aes(start = 0, end = 2 * pi),
color = NA) + # hide outline
# force equal aspect ratio in order to get true circle
coord_equal(xlim = c(70, 100), expand = FALSE)
Plot generation took <1s on my machine. Yours may differ.
I'm not sure this satisfies everything, but it should be a start. To cut down on the time for plotting, I'm summarizing the data into a grid, which lets you use geom_raster. I don't entirely understand the breaks and everything you're using, so you might want to tweak some of how I divided the data for making the distinct bands. I tried out a couple ways with cut_interval and cut_width--this would be a good place to plug in different options, such as the number or width of bands.
Since you mentioned getting the average z for each band, I'm grouping by the gridded x and y and the cut dist, then using mean of z for setting bands. I threw in a step to make labels like in the example--you probably want to reverse them or adjust their positioning--but that comes from getting the number of each band's factor level.
library(tidyverse)
set.seed(555)
n <- 1e6
df <- data_frame(
x = 100 - abs(rnorm(n, 0, 5)),
y = 50 + rnorm(n, 0, 3),
dist = sqrt((x - 100)^2 + (y - 50)^2),
z = exp(-(dist / 8)^2)
) %>%
mutate(brk = cut(dist, seq(0, max(dist), by = 5), include.lowest = T))
summarized <- df %>%
filter(!is.na(brk)) %>%
mutate(x_grid = floor(x), y_grid = floor(y)) %>%
group_by(x_grid, y_grid, brk) %>%
summarise(avg_z = mean(z)) %>%
ungroup() %>%
# mutate(z_brk = cut_width(avg_z, width = 0.15)) %>%
mutate(z_brk = cut_interval(avg_z, n = 9)) %>%
mutate(brk_num = as.numeric(z_brk))
head(summarized)
#> # A tibble: 6 x 6
#> x_grid y_grid brk avg_z z_brk brk_num
#> <dbl> <dbl> <fct> <dbl> <fct> <dbl>
#> 1 75 46 (20,25] 0.0000697 [6.97e-05,0.11] 1
#> 2 75 47 (20,25] 0.000101 [6.97e-05,0.11] 1
#> 3 75 49 (20,25] 0.0000926 [6.97e-05,0.11] 1
#> 4 75 50 (20,25] 0.0000858 [6.97e-05,0.11] 1
#> 5 75 52 (20,25] 0.0000800 [6.97e-05,0.11] 1
#> 6 76 51 (20,25] 0.000209 [6.97e-05,0.11] 1
To make the labels, summarize that data to have a single row per band--I did this by taking the minimum of the gridded x, then using the average of y so they'll show up in the middle of the plot.
labels <- summarized %>%
group_by(brk_num) %>%
summarise(min_x = min(x_grid)) %>%
ungroup() %>%
mutate(y_grid = mean(summarized$y_grid))
head(labels)
#> # A tibble: 6 x 3
#> brk_num min_x y_grid
#> <dbl> <dbl> <dbl>
#> 1 1 75 49.7
#> 2 2 88 49.7
#> 3 3 90 49.7
#> 4 4 92 49.7
#> 5 5 93 49.7
#> 6 6 94 49.7
geom_raster is great for these situations where you have data in an evenly spaced grid that just needs uniform tiles at each position. At this point, the summarized data has 595 rows, instead of the original 1 million, so the time to plot shouldn't be an issue.
ggplot(summarized) +
geom_raster(aes(x = x_grid, y = y_grid, fill = z_brk)) +
geom_label(aes(x = min_x, y = y_grid, label = brk_num), data = labels, size = 3, hjust = 0.5) +
theme_void() +
theme(legend.position = "none", panel.background = element_rect(fill = "gray40")) +
coord_fixed() +
scale_fill_brewer(palette = "PuBu")
Created on 2018-11-04 by the reprex package (v0.2.1)
Consider this simple example
library(dplyr)
library(forcats)
library(ggplot2)
mydata <- data_frame(cat1 = c(1,1,2,2),
cat2 = c('a','b','a','b'),
value = c(10,20,-10,-20),
time = c(1,2,1,2))
mydata <- mydata %>% mutate(cat1 = factor(cat1),
cat2 = factor(cat2))
> mydata
# A tibble: 4 x 4
cat1 cat2 value time
<fct> <fct> <dbl> <dbl>
1 1 a 10.0 1.00
2 1 b 20.0 2.00
3 2 a -10.0 1.00
4 2 b -20.0 2.00
Now, I want to create a chart where I interact the two factor variables.
I know I can use interact in ggplot2 (see below).
My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual.
For instance:
ggplot(mydata,
aes(x = time, y = value, col = interaction(cat1, cat2) )) +
geom_point(size=15) + theme(legend.position="bottom")+
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme(legend.position="bottom",
legend.text=element_text(size=12, face = "bold")) +
scale_colour_manual(name = ""
, values=c("red","red4","royalblue","royalblue4")
, labels=c("1-b","1-a"
,"2-a","2-b"))
shows:
which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual(). Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.
Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats? Perhaps creating the labels as strings in the dataframe beforehand?
Thanks!
If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv(), rather than assign them manually.
The colour cheatsheet here summarise the HSV colour model rather nicely:
Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].
Here's how I would adapt it for this use case:
mydata2 <- mydata %>%
# use "-" instead of the default "." since we are using that for the labels anyway
mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
# cat1: assign hue evenly across the whole wheel,
# cat2: restrict both saturation & value to the [0.3, 1], as it can look too
# faint / dark otherwise
mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
> colour.vector
1-a 1-b 2-a 2-b
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000"
With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:
ggplot(mydata2,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector)) +
theme(legend.position = "bottom")
An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:
mydata3 <- data.frame(
cat1 = factor(rep(1:3, times = 5)),
cat2 = rep(LETTERS[1:5], each = 3),
value = 1:15,
time = 15:1
) %>%
mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
ggplot(mydata3,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector),
guide = guide_legend(byrow = TRUE)) +
theme(legend.position = "bottom")
The following is how my data frame looks like:
CatA CatB CatC
1 Y A
1 N B
1 Y C
2 Y A
3 N B
2 N C
3 Y A
4 Y B
4 N C
5 N A
5 Y B
I want to have CatA on X-Axis, and its count on Y-Axis. This graph comes fine. However, I want to create group for CatB and stack it with CatC keeping count in Y axis. This is what I have tried, and this is how it looks:
I want it to look like this:
My code:
ggplot(data, aes(factor(data$catA), data$catB, fill = data$catC))
+ geom_bar(stat="identity", position = "stack")
+ theme_bw() + facet_grid( ~ data$catC)
PS: I am sorry for providing links to images because I am not able to upload it, it gives me error occurred at imgur, every time I upload.
You could use facets:
df <- data.frame(A = sample(1:5, 30, T),
B = sample(c('Y', 'N'), 30, T),
C = rep(LETTERS[1:3], 10))
ggplot(df) + geom_bar(aes(B, fill = C), position = 'stack', width = 0.9) +
facet_wrap(~A, nrow = 1) + theme(panel.spacing = unit(0, "lines"))
I have a dataframe as shown below:
ID AC AF Type
1 60 1 0.00352113 1
2 48 1 0.00352113 2
3 25 1 0.00352113 1
4 98 1 0.00352113 2
5 24 1 0.00352113 1
6 64 2 0.00704225 1
I need to plot a step curve of AF on X-axis with its frequency on Y-axis colored by TYPE. I managed to have histogram using the below code:
ggplot(data, aes(x = AF,fill=TYPE))+geom_histogram(aes(y = ..count..),bins=40)
However, i need a curve plot as shown below instead of histogram:
Any suggestions to achieve this?
We can use geom_line with stat = 'count':
First I generate some dummy data:
set.seed(123)
df1 <- data.frame(Type = sample(1:3, 100, replace = T),
AF = sample(1:10, 100, replace = T,
prob = seq(.8, .2, length.out = 10)))
Then we make the plot:
ggplot(df1, aes(x = AF))+
geom_line(stat = 'count', aes(group = Type, colour = factor(Type)))
Here's an alternative (HT to #eipi)
set.seed(123)
df1 <- data.frame(Type = sample(1:3, 1000, replace = T),
AF = round(rnorm(1000), 3))
ggplot(df1, aes(x = AF))+
geom_step(stat = 'bin', aes(group = Type, colour = factor(Type)),
bins = 35)
In the regular graphics library you can do this:
set.seed(1)
AF<-sample(1:20,1000,replace=TRUE)
set.seed(2)
TYPE<-sample(c(1:2),1000,replace = TRUE)
plot(table(AF[which(TYPE==1)])/length(AF[which(TYPE==1)]),type="l",col="blue",
ylab="Frequency of AF",xlab="AF")
points(table(AF[which(TYPE==2)])/length(AF[which(TYPE==2)]),type="l")
legend("bottomright",c("Type1","Type2"),lty=1,lwd=3,col=c("blue","black"))