Labeling Issue in the Stacked Bar Plot using ggplot2 in R - r

I have been trying to create a stacked bar chart using the following codes. But I am facing a problem while generating the plot. Here is the problem and the codes for your reference:
#required packages
require(ggplot2)
require(dplyr)
require(tidyr)
#the data frame
myData <- data.frame(
a = c(70,113),
b = c(243, 238),
c = c(353, 219),
d = c(266, 148),
Gender = c("Male","Female"))
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>% mutate(pos = cumsum(Value) - (0.5 * Value))
# plot bars and add text
p <- ggplot(myData, aes(x = Gender, y = Value)) + geom_bar(aes(fill = Age),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 4)
p
These codes are producing this plot:
In this figure the "Female" bar is alright. But, You could see that the two values from the "Male" Bar that are "70" and "243" lying in the same box and the topmost portion is empty. The numbering order of the four groups are okay.
Why I am getting this? How to correct this figure?

Notice how the numbers aren't in the right colors? The default is order the bars from top to bottom. This is controled by the order of the levels of the variables. To change the way age is draw, reverse the levels of age
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>%
mutate(pos = cumsum(Value) - (0.5 * Value),
Age=forcats::fct_rev(factor(Age)))
Then you will get the ordering of your bars that matches the cumsum that you calculated.

Related

plot counts with intervals in x axis using ggplot2

c1 <- c("p2","p3","p1","p2","p1","p3","p4","p4","p4","p1","p1","p2","p2","p3","p4","p2","p1","p4","p3","p3")
c2 <- c(41,146,79,107,131,127,32,88,119,148,32,65,36,23,44,76,100,98,121,104)
df <- data.frame(c1=c1, c2=c2)
I'm trying to create a stacked bar plot in ggplot2 with intervals in the x axis and counts in the y axis
Conceptually something like this
ggplot(df, aes(x=c2.intervals, y=count.c2.occurrences, fill=c1)) + geom_bar()
in which c2.intervals could be 0-70, 71-100, 100-150
For example, for the interval 0-70, p1 appears once, p2 3 times, p3 once and p4 twice. These would be the counts for the first stacked column in the plot.
What is the best way to approach this problem?
You can use cut() to define your intervals. Also, based on your description, I assume you want fill = c1 rather than fill = c2?
See if the following serves your purpose:
library(dplyr)
df %>%
mutate(c2.intervals = cut(c2, breaks = c(0, 70, 100, 150))) %>%
ggplot(aes(x = c2.intervals, fill = c1)) +
geom_bar()

Creating a grouped bar plot in R using barplot() from raw data

I have the following data:
CT VT TT
A* 5.923076923 6.529411765 5.305555556
Not A* 5.555555556 6.434782609 5.352941176
I want to make a grouped bar chart in R from the data such that the grouping is on A* and Not A*, the x-axis ticks are CT, VT and TT and the numeric values are plotted in the y-direction.
What do I need to do to produce the bar plot from this raw .csv data?
Next time, you should provide a reproducible example, but I use ggplot2 to create the desired bar plot:
Before jumping into the main body, make sure you have the required packages installed as follows:
install.packages(c("ggplot2","data.table"))
Now for a stacked bar chart:
require(ggplot2)
require(data.table)
data <- data.frame(CT = c( 5.923076923 ,5.555555556),
VT = c(6.529411765,6.434782609),
TT = c(5.305555556, 5.352941176))
rownames(data) <- c("A*", "Not A*")
long_format <- melt(as.matrix(data))
ggplot(long_format, aes(x = Var2,
y = value,
fill = Var1)) +
geom_col()
A grouped bar chart:
ggplot(data = long_format,
aes(x = Var2,
y = value,
fill = Var1)) +
geom_bar(position = "dodge",
stat = "identity")

Geom tile white space issue when the x variable is spread unevenly accross facet grids

I'm trying to produce a heat map of gene expression from samples of different conditions, faceted by the conditions:
require(reshape2)
set.seed(1)
expression.mat <- matrix(rnorm(100*1000),nrow=100)
df <- reshape2::melt(expression.mat)
colnames(df) <- c("gene","sample","expression")
df$condition <- factor(c(rep("C1",2500),rep("C2",3500),rep("C3",3800),rep("C4",200)),levels=c("C1","C2","C3","C4"))
I'd like to color by expression range:
df$range <- cut(df$expression,breaks=6)
The width parameter in ggplot's aes is supposed to control the width of the different facets. My question is how to find the optimal width value such that the figure is not distorted?
I played around a bit with this plot command:
require(ggplot2)
ggplot(df,aes(x=sample,y=gene,fill=range,width=100))+facet_grid(~condition,scales="free")+geom_tile(color=NA)+labs(x="condition",y="gene")+theme_bw()
Setting width to be below 100 leaves gaps in the last facet (with the lowest number of samples), and already at this value of 100 you can see that the right column in the first facet from left is distorted (wider than the columns to its left):
So my question is how to fix this/find a width that doesn't cause this.
Edit showing the issue with the sample variable faceted by condition
There is no C1 sample between 25 and 100,
because they are by C2, c3 and C4.
Here is an illustration for the sample < 200.
ggplot(filter(df[df$sample < 200,]),
aes(x=sample, y = gene, fill=range)) +
geom_tile() +
facet_grid(~condition)
The number of sample is not the same in all facets and faceting on conditoins creates wholes between sample numbers for each condition.
One way to go around this problem would be to
create a sample2 number. I work using the dplyr package.
library(dplyr)
sample2 <- df %>%
group_by(condition) %>%
distinct(sample) %>%
mutate(sample2 = 1:n())
df <- df %>%
left_join(sample2, by = c("condition", "sample"))
Then plot using sample2 as the x variable
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition) +
geom_tile(color=NA) + theme_bw()
Using the scales argument to vary scales on the x axis.
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition, scales = "free") +
geom_tile() + theme_bw()
Old answer using width
See for example this answer.
Adding a width aesthetic produces wider columns:
ggplot(df,aes(x = sample, y = gene,
fill = range, width = 50))+
facet_grid(~condition) +
geom_tile(color=NA) +
labs(x="condition",y="gene")+theme_bw()

How to do stacked bar plot in R? (including the value of the var)

i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))

2 stacked histograms with a common x-axis

I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.

Resources