plot counts with intervals in x axis using ggplot2 - r

c1 <- c("p2","p3","p1","p2","p1","p3","p4","p4","p4","p1","p1","p2","p2","p3","p4","p2","p1","p4","p3","p3")
c2 <- c(41,146,79,107,131,127,32,88,119,148,32,65,36,23,44,76,100,98,121,104)
df <- data.frame(c1=c1, c2=c2)
I'm trying to create a stacked bar plot in ggplot2 with intervals in the x axis and counts in the y axis
Conceptually something like this
ggplot(df, aes(x=c2.intervals, y=count.c2.occurrences, fill=c1)) + geom_bar()
in which c2.intervals could be 0-70, 71-100, 100-150
For example, for the interval 0-70, p1 appears once, p2 3 times, p3 once and p4 twice. These would be the counts for the first stacked column in the plot.
What is the best way to approach this problem?

You can use cut() to define your intervals. Also, based on your description, I assume you want fill = c1 rather than fill = c2?
See if the following serves your purpose:
library(dplyr)
df %>%
mutate(c2.intervals = cut(c2, breaks = c(0, 70, 100, 150))) %>%
ggplot(aes(x = c2.intervals, fill = c1)) +
geom_bar()

Related

How to draw different line segment with different facets

I have a question about using geom_segment in R ggplot2.
For example, I have three facets and two clusters of points(points which have the same y values) in each facets, how do I draw multiple vertical line segments for each clustering with geom_segment?
Like if my data is
x <- (1:24)
y <- (rep(1,2),2,rep(2,2),1,rep(3,2),4, rep(4,1),5,6, ..rep(8,2),7)
facets <-(1,2,3)
factors <-(1,2,3,4,5,6)
xmean <- ( (1+2+3)/3, (4+5+6)/3, ..., (22+23+24)/3)
Note: (1+2+3)/3 is the mean first cluster in the first facet and (4+5+6)/3 is the mean second cluster in the second facet and (7+8+9)/3 is the first cluster in the second facet.
My Code:
ggplot(,aes(x=as.numeric(x),y=as.numeric(y),color=factors)+geom_point(alpha=0.85,size=1.85)+facet_grid(~facets)
+geom_segment(what should I put here to draw this line in different factors?)
Desired result:
Please see the picture!
Please see the updated picture!
Thank you so much! Have a nice day :).
Maybe this is what you are looking for. Instead of working with vectors put your data in a dataframe. Doing so you could easily make an aggregated dataframe with the mean values per facet and cluster which makes it easy to the segments:
Note: Wasn't sure about the setup of your data. You talk about two clusters per facet but your data has 8. So I slightly changed the example data.
library(ggplot2)
library(dplyr)
df <- data.frame(
x = 1:24,
y = rep(1:6, each = 4),
facets = rep(1:3, each = 8)
)
df_sum <- df %>%
group_by(facets, y) %>%
summarise(x = mean(x))
#> `summarise()` has grouped output by 'facets'. You can override using the `.groups` argument.
ggplot(df, aes(x, y, color = factor(y))) +
geom_point(alpha = 0.85, size = 1.85) +
geom_segment(data = df_sum, aes(x = x, xend = x, y = y - .25, yend = y + .25), color = "black") +
facet_wrap(~facets)

Labeling Issue in the Stacked Bar Plot using ggplot2 in R

I have been trying to create a stacked bar chart using the following codes. But I am facing a problem while generating the plot. Here is the problem and the codes for your reference:
#required packages
require(ggplot2)
require(dplyr)
require(tidyr)
#the data frame
myData <- data.frame(
a = c(70,113),
b = c(243, 238),
c = c(353, 219),
d = c(266, 148),
Gender = c("Male","Female"))
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>% mutate(pos = cumsum(Value) - (0.5 * Value))
# plot bars and add text
p <- ggplot(myData, aes(x = Gender, y = Value)) + geom_bar(aes(fill = Age),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 4)
p
These codes are producing this plot:
In this figure the "Female" bar is alright. But, You could see that the two values from the "Male" Bar that are "70" and "243" lying in the same box and the topmost portion is empty. The numbering order of the four groups are okay.
Why I am getting this? How to correct this figure?
Notice how the numbers aren't in the right colors? The default is order the bars from top to bottom. This is controled by the order of the levels of the variables. To change the way age is draw, reverse the levels of age
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>%
mutate(pos = cumsum(Value) - (0.5 * Value),
Age=forcats::fct_rev(factor(Age)))
Then you will get the ordering of your bars that matches the cumsum that you calculated.

Geom tile white space issue when the x variable is spread unevenly accross facet grids

I'm trying to produce a heat map of gene expression from samples of different conditions, faceted by the conditions:
require(reshape2)
set.seed(1)
expression.mat <- matrix(rnorm(100*1000),nrow=100)
df <- reshape2::melt(expression.mat)
colnames(df) <- c("gene","sample","expression")
df$condition <- factor(c(rep("C1",2500),rep("C2",3500),rep("C3",3800),rep("C4",200)),levels=c("C1","C2","C3","C4"))
I'd like to color by expression range:
df$range <- cut(df$expression,breaks=6)
The width parameter in ggplot's aes is supposed to control the width of the different facets. My question is how to find the optimal width value such that the figure is not distorted?
I played around a bit with this plot command:
require(ggplot2)
ggplot(df,aes(x=sample,y=gene,fill=range,width=100))+facet_grid(~condition,scales="free")+geom_tile(color=NA)+labs(x="condition",y="gene")+theme_bw()
Setting width to be below 100 leaves gaps in the last facet (with the lowest number of samples), and already at this value of 100 you can see that the right column in the first facet from left is distorted (wider than the columns to its left):
So my question is how to fix this/find a width that doesn't cause this.
Edit showing the issue with the sample variable faceted by condition
There is no C1 sample between 25 and 100,
because they are by C2, c3 and C4.
Here is an illustration for the sample < 200.
ggplot(filter(df[df$sample < 200,]),
aes(x=sample, y = gene, fill=range)) +
geom_tile() +
facet_grid(~condition)
The number of sample is not the same in all facets and faceting on conditoins creates wholes between sample numbers for each condition.
One way to go around this problem would be to
create a sample2 number. I work using the dplyr package.
library(dplyr)
sample2 <- df %>%
group_by(condition) %>%
distinct(sample) %>%
mutate(sample2 = 1:n())
df <- df %>%
left_join(sample2, by = c("condition", "sample"))
Then plot using sample2 as the x variable
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition) +
geom_tile(color=NA) + theme_bw()
Using the scales argument to vary scales on the x axis.
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition, scales = "free") +
geom_tile() + theme_bw()
Old answer using width
See for example this answer.
Adding a width aesthetic produces wider columns:
ggplot(df,aes(x = sample, y = gene,
fill = range, width = 50))+
facet_grid(~condition) +
geom_tile(color=NA) +
labs(x="condition",y="gene")+theme_bw()

How to do stacked bar plot in R? (including the value of the var)

i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))

2 stacked histograms with a common x-axis

I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.

Resources