GGplot geom_bar stack items with even spacing - r

I was wondering if anyone has a solution for me as I would like to visualize a stacked bar chart that kinda looks like this:
This was made with a little data.table and the ggplot code below
library(data.table)
library(ggplot2)
dt <- data.table(id = seq(15), pvalue = c(0.0323616533686601, 0.00405825892193357, 0.00406609088355357, 0.00252697950679603, 0.00277696431629866, 0.0212521760053885, 0.0315721033650767, 0.00716594255390525, 0.00829537987151543, 0.0163753389504665, 0.0328650069220695, 0.0146991756928858, 0.0178425139730873, 0.00345987886149332, 0.0499748920124661))
ggplot(dt, aes(1, id, fill = pvalue)) + geom_bar(stat = 'identity')
But I'm looking for a slight modification. The data has an id column ranging from 1 to 15, this causes every item to have the corresponding size. But I would like to have them the same height/size.
This can be achieved with this bit of code:
ggplot(dt, aes(id, fill = pvalue)) + geom_bar(stat = 'count') + coord_flip()
But when I run this bit, I loose the ability to color them correctly (with scale_fill_gradient2)
Let me know if you find a nice solution :)

I think adding group= is what you are after:
ggplot(dt, aes(y=id, fill = pvalue, group=id)) +
geom_bar()
And if you define y= you don't need to coord_flip()
ps, geom_col() is the same as geom_bar(stat = 'identity')

Related

How to add segments in a lollipop chart ggplot2 in R

Here's my initial barchart code:
Full %>% ggplot(aes(x = reorder(POS, -Iconicity), y = Iconicity, fill = Group)) +
geom_bar(stat = "summary", position=position_dodge(width = 0.9)) +
scale_fill_viridis_d() + # color-blind compatible colors
theme_minimal() + xlab("POS")
Which creates this lovely chart:
So I wanted to turn this into a lollipop chart to make it look neater and more modern, and this is the code I used:
Full %>% ggplot(aes(x = reorder(POS, -Iconicity), y = Iconicity, color = Group)) +
geom_point(size=3, stat = "summary", position=position_dodge(width = 0.9)) +
geom_segment(aes(x=POS,
xend=POS,
y=0,
yend=Iconicity)) +
scale_fill_viridis_d() + # color-blind compatible colors
theme_minimal() + xlab("POS")
However of course that does not add enough segments to the right places and I can't seem to work out how to change to code. What I'm left with is this:
I'm still quite a novice at R clearly so forgive me
I think the issue is coming from the fact you summarise your geom_point and not geom_segment and you are using position_dodge in geom_point.
Without a reproducible example of what is Full, it is hard to be sure of the answer to your question, but maybe you can try to summarise your values outside of ggplot and apply the same position_dodge to every geom:
Full %>%
group_by(Group) %>%
summarise(Iconicity = mean(Iconicity, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(POS, -Iconicity), y = Iconicity, color = Group)) +
geom_point(size=3,position=position_dodge(width = 0.9)) +
geom_segment(aes(x=POS,
xend=POS,
y=0,
yend=Iconicity), position = position_dodge(0.9) ) +
scale_fill_viridis_d() + # color-blind compatible colors
theme_minimal() + xlab("POS")
Does it answer your question ?
If not, please provide a reproducible example of what is your Full Dataset (see this link: How to make a great R reproducible example)

getting a ggplot2 to display relative contributions of each element to the total

genocount <-ggplot(SNPs, aes(genotype))
genocount + geom_bar()
Creates this Bar Chart:
I would like to be able to display the percentage contribution per chromosome to each genotype in a stacked orientation (those are displayed along the x axis). I've tried some methods that I've seen suggested by others, but they return different errors...I'm not sure if there's an incompatibility with my data set or if it's something else.
Thanks for your help!
library(scales)
ggplot(SNPs, aes(genotype))
genocount + geom_bar(aes(position = "fill", fill = chromosome))+
geom_text(aes(label = percent(chromosome/sum(chromosome))))
scale_y_continuous(labels = percent_format())
I know exactly what you mean.
In the case of a bar chart the code is the following
ggplot(mydf) +
geom_bar(aes(x = var1,y = (..count..)/sum(..count..)),
stat = "count",position = "identity")
In the case of an histogram, the code is the following:
ggplot(data = df) +
geom_histogram(aes(x = var1, y = (..count..)/sum(..count..)),
position = "identity")
Don't ask me what is ..count..
I only know it is a black magic that works

Proportional barplot in R

I am trying to create a proportional barplot in R, so far I have managed to do something like this:
library(ggplot2)
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color))
This obviously does not work, but neither does this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color,group=1))
or this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color))
This works:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color),position="fill")
But I would like bars to be side by side within a category.
What I want to do is to get proportional barplot without transforming my data before
I think you need to aggregate first and then use position="dodge":
diamonds2 <- aggregate(carat ~ cut + color, diamonds, length)
ggplot(data = transform(diamonds2, p = ave(carat, cut, FUN = function(x) x/sum(x))),
aes(x = cut, y = p, fill=color))+
geom_bar(stat = "identity", position = "dodge")
The resulting plot:
EDIT after OP's comment
If you want conditional and side-by-side histograms use geom_bar(stat = "identity", position = "dodge") when you call your conditional histogram plot with ggplot2 (I display the first 100 rows of data for sake of clarity)
library(ggplot2)
ggplot(data = diamonds[1:100, ], aes(cut, carat, fill = color)) + geom_bar(stat = "identity", position = "dodge")

ggplot2: geom_bar with group, position_dodge and fill

I am trying to generate a barplot such that the x-axes is by patient with each patient having multiple samples. So for instance (using the mtcars data as a template of what the data would look like):
library("ggplot2")
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
This would produce something like this:
With each barplot representing a sample in each patient.
I want to add additional information about each patient sample by using colors to fill the barplots (e.g. different types of mutations in each patient sample). I was thinking I could specify the fill parameter like this:
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But this doesn't produce "stacked barplots" for each patient sample barplot. I am assuming this is because the position_dodge() is set. Is there anyway to get around this? Basically, what I want is:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar() +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But with these colors available in the first plot I listed. Is this possible with ggplot2?
I think facets are the closest approximation to what you seem to be looking for:
ggplot(mtcars, aes(x = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample") +
facet_wrap(~cyl)
I haven't found anything related in the issue tracker of ggplot2.
If I understand your question correctly, you want to pass in aes() into your geom_bar layer. This will allow you to pass a fill aesthetic. You can then place your bars as "dodge" or "fill" depending on how you want to display the data.
A short example is listed here:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar(aes(fill = factor(vs)), position = "dodge", binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
With the resulting plot: http://imgur.com/ApUJ4p2 (sorry S/O won't let me post images yet)
Hope that helps!
I have hacked around this a few times by layering multiple geom_cols on top of each other in the order I prefer. For example, the code
ggplot(data, aes(x=cert, y=pct, fill=Party, group=Treatment, shape=Treatment)) +
geom_col(aes(x=cert, y=1), position=position_dodge(width=.9), fill="gray90") +
geom_col(position=position_dodge(width=.9)) +
scale_fill_manual(values=c("gray90", "gray60"))
Allowed me to produce the feature you're looking for without faceting. Notice how I set the background layer's y value to 1. To add more layers, you can just cumulatively sum your variables.
Image of the plot:
I guess, my answer in this post will help you to build the chart with multiple stacked vertical bars for each patient ...
Layered axes in ggplot?
One way I don't see suggested above is to use facet_wrap to group samples by patient and then stack mutations by sample. Removes the need for dodging. Also changed and modified which mtcars attributes used to match question and get more variety in the mutations attribute.
patients <-c('Tom','Harry','Sally')
samples <- c('S1','S2','S3')
mutations <- c('M1','M2','M3','M4','M5','M6','M7','M8')
ds <- data.frame(
patients=patients[mtcars$cyl/2 - 1],
samples=samples[mtcars$gear - 2],
mutations=mutations[mtcars$carb]
)
ggplot(
ds,
aes(
x = factor(samples),
group = factor(mutations),
fill = factor(mutations)
)
) +
geom_bar() +
facet_wrap(~patients,nrow=1) +
ggtitle('Patient') +
xlab('Sample') +
ylab('Number of Mutations per Patient Sample') +
labs(fill = 'Mutation')
Output now has labels that match the specific language of the request...easier to see what is going on.

What is the simplest method to fill the area under a geom_freqpoly line?

The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)

Resources