I'm working with ggplot2, stacked barplot to 100% with relative values, using the position = "fill" option in geom_bar().
Here my code:
test <- data.frame (x = c('a','a','a','b','b','b','b')
,k = c('k','j','j','j','j','k','k')
,y = c(1,3,4,2,5,9,7))
plot <- ggplot(test, aes(x =x, y = y, fill = k))
plot <- plot + geom_bar(position = "fill",stat = "identity")
plot <- plot + scale_fill_manual(values = c("#99ccff", "#ff6666"))
plot <- plot + geom_hline(yintercept = 0.50)+ggtitle("test")
plot
Here the result:
However, I need to add the labels on the various bars, also on the "sub bars". To do this, I worked with the geom_text():
plot + geom_text(aes(label=y, size=4))
But the result is not good. I tried without luck the hjust and vjust parameters, and also using something like:
plot + geom_text(aes(label=y/sum(y), size=4))
But I did not reach the result needed (I'm not adding all the tests to not overload the question with useless images, if needed, please ask!).
Any idea about to have some nice centered labels?
label specifies what to show, and y specifies where to show. Since you are using proportions for y-axis with position = "fill", you need to calculate the label positions (geom_text(aes(y = ...))) in terms of proportions for each x using cumulative sums. Additionally, to display only the total proportion of a given color, you will need to extract the Nth row for each x, k combination. Here, I am building a separate test_labels dataset for use in geom_text to display the custom labels:
test <- data.frame (x = c('a','a','a','b','b','b','b'),
k = c('k','j','j','j','j','k','k'),
y = c(1,3,4,2,5,9,7))
test_labels = test %>%
arrange(x, desc(k)) %>%
group_by(x) %>%
mutate(ylabel_pos = cumsum(y)/sum(y),
ylabel = y/sum(y)) %>%
group_by(k, add = TRUE) %>%
mutate(ylabel = sum(ylabel)) %>%
slice(n())
ggplot(test, aes(x =x, y = y, fill = k)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual(values = c("#99ccff", "#ff6666")) +
geom_hline(yintercept = 0.50) +
geom_text(data = test_labels,
aes(y = ylabel_pos, label=paste(round(ylabel*100,1),"%")),
vjust=1.6, color="white", size=3.5) +
ggtitle("test")
Result:
> test_labels
# A tibble: 4 x 5
# Groups: x, k [4]
x k y ylabel_pos ylabel
<fctr> <fctr> <dbl> <dbl> <dbl>
1 a j 4 1.0000000 0.8750000
2 a k 1 0.1250000 0.1250000
3 b j 5 1.0000000 0.3043478
4 b k 7 0.6956522 0.6956522
Related
I have a relatively large dataset that I can share here.
I am trying to plot all the lines (not just one: e.g. a mean or a median) corresponding to the values of y over x = G, with the data grouped by I and P; so that the levels of the variable I appear with a different colour and the levels of the variable P appear with a different line type.
The problem I have is that the graph I get is a zig-zag line graph along the x-axis. The aim, obviously, is to have a line for each combination of data, avoiding the zig-zag. I have read that this problem could be related to the way the data is grouped. I have tried several combinations of data grouping using group but I can't solve the problem.
The code I use is as follows:
#Selecting colours
colours<-brewer.pal(n = 11, name = "Spectral")[c(9,11,1)]
#Creating plot
data %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(linetype=P,color=I),size=0.2)+
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
scale_x_continuous(breaks = seq(0,100, by=25), limits=c(0,100)) +
scale_y_continuous(breaks = seq(0,1, by=0.25), limits=c(0,1)) +
labs(x = "Time", y = "Value") +
theme_classic()
I also tried unsuccessfully adding group=interaction(I, P) inside ggplot(aes()), as they suggests in other forums.
Following #JonSpring's point:
dd2 <- (filter(dd,G %in% c(16,17))
%>% group_by(P,I,G)
%>% summarise(n=length(unique(y)))
)
shows that you have many different values of y for each combination of G/I/P:
# A tibble: 12 x 4
# Groups: P, I [6]
P I G n
<chr> <chr> <dbl> <int>
1 heterogeneity I005 16 34
2 heterogeneity I005 17 37
3 heterogeneity I010 16 34
... [etc.]
One way around this, if you so choose, is to use stat_summary() to have R collapse the y values in each group to their mean:
(dd %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
stat_summary(fun=mean, geom="line",
aes(linetype=P,color=I,group=interaction(I,P)),size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
You could also do this yourself with group_by() + summarise() before calling ggplot.
There's not enough information in the data set as presented to identify individual lines. If we are willing to assume that the order of the values within a given I/G/P group is an appropriate indexing variable, then we can do this:
## add index variable
dd3 <- dd %>% group_by(P,I,G) %>% mutate(index=seq(n()))
(dd3 %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(group=interaction(index,I,P)), size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
If this isn't what you had in mind, then you need to provide more information ...
Suppose I have this data:
xy <- data.frame(cbind(c(1,2,3,4,5,2,3,4),c(rep('A',5),rep('B',3))))
So, when I type
ggplot(xy, aes(x = x, fill = y)) +
geom_histogram(aes(y=..count../sum(..count..)), position = "dodge")
I get this graphic:
But I wanted to see the levels independently leveled, i.e., the red bars leveled to 0.2 and the blue bars leveled to 0.333. How can I achieve it?
Also, how can I set the y-axis to show the numbers in percentage instead of decimals?
Many thanks in advance.
This seems to do the job. It uses ..density.. rather than ..count.., a rather ugly way to count the number of levels in the A/B factor column, and then the scales package to get the labels on the y axis
ggplot(xy, aes(x = x, fill = y)) +
geom_histogram(aes(y=..density../sum(..density..)*length(unique(xy$y)), group = y), position = "dodge") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1))
Alternatively to calculate everything in ggplot, you can first calculate the relative frequency and then use this value to plot it with geom_col. preserve = "single" preserves equal width of the bars:
library(ggplot2)
library(dpylr)
xy <- data.frame(x = c(1,2,3,4,5,2,3,4),
y = c(rep('A',5),rep('B',3)))
xy <- xy %>%
group_by(y, x) %>%
summarise(rel_freq = n()) %>%
mutate(rel_freq = rel_freq / n())
ggplot(xy, aes(x = x, y = rel_freq, fill = y)) +
geom_col(position = position_dodge2(preserve = "single")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1))
I am trying to create a custom histogram with a rug plot showing the original values on the X axis.
I am going to use the mtcars dataset to illustrate. Its not be best dataset for this question...but hopefully the reader will understand what I am trying to achieve...
Below shows the basic histogram, without any rug plot attempt.
I want to create the histogram using geom_bar as this allows for more flexibility with custom bins.
I also want a small gap between the histgram bars (i.e width = 0.95) .... which adds to this
problem's complexity.
library(dplyr)
library(ggplot2)
# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)
# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())
# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")
# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p
Next, try and add a basic rug plot on the X axis. This obviously doesn't work as the geom_bar and geom_rug have completely different scales.
# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p
Now, try and rescale the mpg column to match with the ordinal scale....
First define a linear mapping function...
fn_linear_map <- function(vct_existing_val, vct_new_range) {
# example....converts 1:20 into the range 1 to 10 like this:
# fn_linear_map(1:20, c(1, 10))
fn_r_diff <- function(x) x %>% range() %>% diff()
flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
vct_old_min_offset <- vct_existing_val - min(vct_existing_val)
vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
return(vct_new_range_val)
}
Now apply the function...we try and map mpg to the range 1 to 4 (which is an attempt to match
the ordinal scale)
mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))
Try the plot again.... getting closer ... but not really accurate...
# attempt 3: getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p
The graph above is getting close to what I want....but rug plot does not line up
with the actual data ... example the max observation (33.9) should be displayed
almost aligning with the right hand side of the bar.. see below:
mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)
Your scale makes no sense to me, as you are showing a bin that is twice as wide using the same bar width. Doing that in combination with a rug strikes me as confusing as best and misleading at worst. I suggest you plot the bars with their correct widths, after which the rug is trivial.
I think the best solution is to just use geom_histogram:
ggplot(mtcars, aes(mpg)) +
geom_histogram(breaks = vct_seq, col = 'grey80') +
geom_rug(aes(mpg, y = NULL))
If you really want the gaps between the bars you'll have to do more work:
library(tidyr)
d <- mtcars %>%
count(bin) %>%
separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>%
mutate_at(vars('min', 'max'), readr::parse_number) %>%
mutate(
middle = min + (max - min) / 2,
width = 0.9 * (max - min)
)
ggplot(d, aes(middle, n)) +
geom_col(width = d$width) +
geom_rug(aes(mpg, y = NULL), mtcars)
I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.
I have a data set where I need to represent a stacked bar chart for two cohorts over three time periods. Currently, I am faceting by year, and filling based on probability values for my DV (# of times,t, that someone goes to a nursing home; pr that t=0, t=1, ... t >= 5). I am trying to figure out if it is possible to introduce another color scale, so that each of the "Comparison" bars would be filled with a yellow gradient, and the treatmetn bars would be filled with a blue gradient. I figure the best way to do this may to be to overlay the two plots, but I'm not sure if it is possible to do this in ggplot (or some other package.) Code and screenshot are below:
tempPlot <- ggplot(tempDF,aes(x = HBPCI, y = margin, fill=factor(prob))) +
scale_x_continuous(breaks=c(0,1), labels=c("Comparison", "Treatment"))+
scale_y_continuous(labels = percent_format())+
ylab("Prob snf= x")+
xlab("Program Year")+
ggtitle(tempFlag)+
geom_bar(stat="identity")+
scale_fill_brewer(palette = "Blues")+ #can change the color scheme here.
theme(axis.title.y =element_text(vjust=1.5, size=11))+
theme(axis.title.x =element_text(vjust=0.1, size=11))+
theme(axis.text.x = element_text(size=10,angle=-45,hjust=.5,vjust=.5))+
theme(axis.text.y = element_text(size=10,angle=0,hjust=1,vjust=0))+
facet_grid(~yearQual, scales="fixed")
You may want to consider using interaction() -- here's a reproducible solution:
year <- c("BP", "PY1", "PY2")
type <- c("comparison", "treatment")
df <- data.frame(year = sample(year, 100, T),
type = sample(type, 100, T),
marg = abs(rnorm(100)),
fact = sample(1:5, 100, T))
head(df)
# year type marg fact
# 1 BP comparison 0.2794279 3
# 2 PY2 comparison 1.6776371 1
# 3 BP comparison 0.8301721 2
# 4 PY1 treatment 0.6900511 1
# 5 PY2 comparison 0.6857421 3
# 6 PY1 treatment 1.4835672 3
library(ggplot2)
blues <- RColorBrewer::brewer.pal(5, "Blues")
oranges <- RColorBrewer::brewer.pal(5, "Oranges")
ggplot(df, aes(x = type, y = marg, fill = interaction(factor(fact), type))) +
geom_bar(stat = "identity") +
facet_wrap(~ year) +
scale_fill_manual(values = c(blues, oranges))