Plot three grouped columns of data in R ggplot2 - r

I have data that resembles the following:
Ref Var Obs
A A 2
A C 6
A T 8
A G 2
C A 9
C C 1
C T 8
C G 4
T A 6
T C 1
T T 9
T G 6
G A 3
G C 1
G T 7
G G 2
And I am trying to use qplot to plot the data but I'm not sure how to display three columns of information instead of just two, and in a grouped maner. I would like to plot a bar plot with number of obs on the y axis and var on the x-axis grouped by ref. The following is the idea of what I am trying to do:

If I understood well your graphic, I suggest this:
Your data:
seq=c("A", "C", "T", "G")
df=data.frame('Ref'=rep(seq, each=4), 'Var'=rep(seq, 4), 'Obs'=rpois(16, 2))
The plot:
ggplot(data=df) + aes(x=Ref, group=Var, y=Obs) + geom_bar(stat='identity', position="dodge", fill="lightblue", color="black")
Rendering:
Or if you need to see the complete axis legends, you can use the facetting:
ggplot(data=df) + aes(x=Var, y=Obs) +
geom_bar(stat='identity', position="dodge", fill="lightblue", color="black") +
facet_grid(~Ref)
last remark: if you want to change the order of the bars, just modify the levels of the factor variables.

Related

Plot line on ggplot2 grouped bar chart

I have this data frame:
`Last Name` Feature Value
<chr> <chr> <dbl>
1 Name1 Resilience 1
2 Name2 Resilience 6
3 Name3 Resilience 2
4 Name1 Self-Discipline 3
5 Name2 Self-Discipline 7
6 Name3 Self-Discipline 4
7 Name1 Assertiveness 6
8 Name2 Assertiveness 7
9 Name3 Assertiveness 6
10 Name1 Activity level 4
and created a grouped barplot with the following code:
bar2 <- ggplot(team_sih_PP1, aes(x=Feature, y=Value, fill =`Last Name`)) + geom_bar(stat="identity", position="dodge") + coord_cartesian(ylim=c(1,7)) + scale_y_continuous(n.breaks = 7) +scale_fill_manual(values = c("#2a2b63", "#28d5ac", "#f2eff2")) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
I also created a new data frame that holds the average values of the 3 Last Names in each Feature:
mean_name means
1 Action 4.000000
2 Reflection 4.000000
3 Flexibility 3.666667
4 Structure 3.666667
I want to add a line that shows the means of each Feature so that it looks something like this:
I managed to plot just the line but not in the bar chart, please help!
Assuming you have your code correct for geom_line() to add to your plot, you will not see anything plotted unless you set the group aesthetic the same across your plot (ex. aes(group=1)). This is because your x axis is made of discrete values, and ggplot does not know that they are connected with your data via a line. When you set group=1 in the aesthetic, it forces ggplot2 to recognize that the entire dataset is tied together, and then the points of your line will be connected.
I'd show using your data you shared, but it does not provide the same plots as you've shown, so here's a representative example.
x_values <- c('These', 'Values', "are", "ordered", "but", "discrete")
set.seed(8675309)
df <- data.frame(
x=rep(x_values, 2),
type=rep(c("A", "B"), each =6),
y=sample(1:10, 12, replace=TRUE)
)
df$x <- factor(df$x, levels=x_values)
d_myline <- data.frame(
x=x_values,
rando=c(1,5,6,10,4,6)
)
p <- ggplot(df, aes(x,y)) +
geom_col(aes(fill=type), position="dodge", width=0.5)
The following code will not create a line on the plot (you won't get an error either, it just won't appear):
p + geom_line(data=d_myline, aes(x=x, y=rando))
However, if you set group=1, it shows the line as expected:
p + geom_line(data=d_myline, aes(x=x, y=rando, group=1))

How to plot lines and dots in the same plot while using different sized data

This toy data frame represents my data.
Time Gene Value
1 0 A 1
2 1 A 2
3 2 A 3
4 0 B 1
5 1.2 B 2
6 1.7 B 2
7 2.1 B 2
8 3 B 2
Using the following code I can turn this into a line plot with two lines, one for A and one for B.
ggplot(data=Data, aes(x=Time, y=Value, group=Gene)) +
geom_line(aes(color=Gene), linetype="longdash", size=2)+
theme_classic()+
labs(title= paste("Genes over time course"),
x="Time",
y="Expression")+
theme(plot.title=element_text(size=20, face="bold",hjust = 0.5),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=10),
axis.title.x=element_text(size=15),
axis.title.y=element_text(size=15),
legend.text=element_text(size=10))
However, I would like Gene A to be represented by only dots, and Gene B to be represented by only a line. How can I accomplish this given the data?
Using data=~subset(., ...) we can control which data goes to each layer.
ggplot(Data, aes(x = Time, y = Value, color = Gene, group = Gene)) +
geom_line(data = ~ subset(., Gene != "A")) +
geom_point(data = ~ subset(., Gene == "A"))
(You can also use dplyr::select in place of subset, the results are the same.)

In a ggplot2 geom_tile plot, is it possible to dodge the positions of tiles?

I'm trying to produce a bar plot where the bars fade vertically according to a third variable, and I'm using geom_tile to enable this. However, I have multiple bars for a given category on the x-axis, and I'd like to dodge their positions to put alike x values together in groups of bars which don't overlap.
Is it possible to use position='dodge' or similar with geom_tile and, if so, what's wrong with my syntax?
a <- data.frame(x = factor(c(rep('a',5), rep('a',5), rep('b',5), rep('c',5))),
y = c(1:5, 1:5, 1:5, 1:5),
z = c(5:1, c(5,4,4,4,1), 5:1, 5:1)
)
ggplot(a, aes(x = x, y = y, group = x)) +
geom_tile(aes(alpha = z, fill = x, width = 1),
position = 'dodge')
The example data frame a looks like this:
x y z
1 a 1 5
2 a 2 4
3 a 3 3
4 a 4 2
5 a 5 1
6 a 1 5
7 a 2 4
8 a 3 4
9 a 4 4
10 a 5 1
11 b 1 5
12 b 2 4
13 b 3 3
14 b 4 2
15 b 5 1
16 c 1 5
17 c 2 4
18 c 3 3
19 c 4 2
20 c 5 1
...and the resulting graph from the current code has no gaps between the x values, and the two where x is a are drawn on top of one-another:
I want those two bars where x is 'a' to be drawn as separate bars.
This is a mock-up of what I want the result to look like. The data are not correct for either of the a columns but it shows the grouping on the x-axis which is desired:
EDIT 2
To get your desired effect, use geom_bar() but be sure to change the y data to indicate the bar height, in this case 1. The reason is that the bars get stacked, so there is no need to specify the y-axis position, but instead specify the height.
Try this:
library(ggplot2)
a <- data.frame(x = factor(c(rep('a',5), rep('a',5), rep('b',5), rep('c',5))),
y = 1,
z = c(5:1, c(5,4,4,4,1), 5:1, 5:1)
)
a$bar <- rep(1:4, each=5)
ggplot(a, aes(x = factor(bar), y=y, fill=x, alpha=z)) +
geom_bar(stat="identity") +
facet_grid(~x, space="free", scale="free")
You should get:
EDIT 1
You can get close to what you describe by:
Explicitly adding another column that differentiates different bars in the same category
Using faceting
For example:
a$bar <- rep(1:4, each=5)
ggplot(a, aes(x = factor(bar), y = y, fill=x, alpha=z)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~x, space="free", scale="free")
ORIGINAL
You can use geom_bar() for this, by using stat="identity":
ggplot(a, aes(x = x, y = y, fill=x, alpha=z)) +
geom_bar(stat="identity", position="dodge")

Combined positive and negative stacked line plots

I am trying to make a stacked line plot in ggplot2 with positive values stacked above the x-axis and negative values stacked separately below the x-axis. I have had success stacking each of the line types separately, but have not been able to have both on a single plot. I'm looking for some help on how I can do this, either by overlaying plots or doing something creative on a single plot.
My code below uses a simple ggplot with stacked geom_line plot. Half of the "Types" are positive values with respect to time and the other half of the "Types" are all negative values.
p <- ggplot(dataForm, aes(x=Time,y=Value,group=Type),colour=factor(Type))
p + geom_line(aes(fill = Type),position = "stack")
I have tried an alternative of specifying the positive and negative values separately without success:
p <- ggplot(dataForm, aes(x=Time,y=Value,group=Type),colour=factor(Type))
p + geom_line(data = data1,aes(fill = Type),position = "stack")
p + geom_line(data = data1,aes(fill = Type),position = "stack")
Any advice on how to do this is greatly appreciated. Thanks.
In the absence of a reproducible example, I adapted this example from learnr:
library(ggplot2)
library(plyr)
data = read.table(text="Time Type Value
1 a 8
2 a 10
3 a 10
4 a 5
5 a 3
1 b 9
2 b 5
3 b 7
4 b 8
5 b 3
1 c -3
2 c -1
3 c -5
4 c -4
5 c -7
1 d -11
2 d -3
3 d -9
4 d -6
5 d -6", header=TRUE)
p <- ggplot(data, aes(x=Time))
p <- p + geom_line(subset = .(Type %in% c('a', 'b')),
aes(y=Value, colour = Type),
position = 'stack')
p <- p + geom_line(subset = .(Type %in% c('c', 'd')),
aes(y=Value, colour = Type),
position = 'stack')
p
To produce this:
And, for good measure, an area chart with a horizontal line:
p <- ggplot(data, aes(x=Time))
p <- p + geom_area(subset = .(Type %in% c('a', 'b')),
aes(y=Value, fill=Type),
position = 'stack')
p <- p + geom_area(subset = .(Type %in% c('c', 'd')),
aes(y=Value, fill = Type),
position = 'stack')
p <- p + geom_hline(yintercept=0)
p

Getting percentage using histogram when used with facetting

I have the following data frame
z x y
1 1 a
2 2 a
3 1 a
4 2 a
5 1 b
6 9 b
7 9 b
8 8 b
9 7 b
when I do
p = ggplot(z,aes(x,group=y)) + geom_histogram(aes(y = ..density..,group=y)) + facet_grid(y ~ .)
p
I get the faceted plots, but not with the percentages on the y-axis for each symbol within z$y.
Basically, I want a histogram chart, but with the percentages that show the frequency distribution within each value of z$y i.e. a,b.
In this case, under 'a', 50% is 1 and 50% is 2, and under 'b', 20% is 1, 40% is 9, 20% is 7 and 20% is 8. I want this charted as histograms using faceting.
That is not a histogram (there is no density estimation), but a bar chart.
d <- data.frame(
value = c(1,2,1,2,1,9,9,8),
group = c(rep("a",4),rep("b",4))
)
# With counts
ggplot(d) + geom_bar(aes(factor(value))) + facet_grid(group ~ .)
# With percentages
ggplot(d) +
geom_bar(aes(factor(value), (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent') +
facet_grid(group ~ .)
Note: In more recent versions of ggplot2 we would use scale_y_continuous(labels = percent_format()) instead, and make sure to load the scales package.

Categories

Resources