R ggplot2 stacked barplot normalized by the value of a column - r

I have the following dataframe t :
name type total
a 1 20
a 1 20
a 3 20
a 2 20
a 3 20
b 1 25
b 2 25
c 5 35
c 5 35
c 6 35
c 1 35
The total is the identical for all the entries with the same name.
I want to plot a stacked barplot with type on the x axis and count of name normalized by the total on the y axis.
I plotted the non normalized plot by the following :
ggplot(t, aes(type,fill= name))+geom_bar() + geom_bar(position="fill")
How can I plot the normalized barplot ? i.e for type = 1 the y axis value would be 2/20 for a and 1/25 for b and 1/35 for c...
My try which did not work:
ggplot(t, aes(type, ..count../t$total[1],fill= name))+geom_bar() + geom_bar(position="fill")

Read in the data
d <- read.table(header = TRUE, text =
'name type total
a 1 20
a 1 20
a 3 20
a 2 20
a 3 20
b 1 25
b 2 25
c 5 35
c 5 35
c 6 35
c 1 35')
It's a bad idea to call it t, since that is the name of the transpose function.
Calculate the fractions
library(dplyr)
d2 <- d %>%
group_by(name, type) %>%
summarize(frac = n() / first(total))
This is much easier to do using the dplyr package.
Make the plot
ggplot(d2, aes(type, frac, fill = name)) +
geom_bar(stat = 'identity')
Result

Related

Plotting multiple bar plots on same y-axis but each on separate x-axis in ggplot2 for count data

I have some count variables against which I want to make bar-plots on the same y-axis but I have no grouping variable. Something like the following plot
B <- 25
iter_M1
[1] 5 13 14 11 7 8 10 14 10 5 7 13 10 12 4 5 9 6 5 12 8 8 7 11 9
max_M1 <- max(iter_M1)
count_M1 <- integer(max_M1)
for(i in 1:max_M1)
{
for(j in 1:B)
{
if(iter_M1[j] == i)
count_M1[i] = count_M1[i] +1
}
}
count_M1
[1] 0 0 0 1 4 1 3 3 2 3 2 2 2 2
df <- data.frame(x = 1:max_M1, y = count_M1)
p_M1 <-ggplot(data=df, aes(x=x, y=y)) + geom_bar(stat="identity")
p_M1
This results in a plot like this
and another similar variable
iter_M2
[1] 3 1 3 2 6 3 4 4 3 7 4 2 2 3 4 3 4 4 1 3 7 3 2 4 2
max_M2 <- max( iter_M2)
count_M2 <- integer(max_M2)
for(i in 1:max_M2)
{
for(j in 1:B)
{
if(iter_M2[j] == i)
count_M2[i] = count_M2[i] +1
}
}
count_M2
[1] 2 5 8 7 0 1 2 df1 <- data.frame(x1 = 1:max_M2, y1 = count_M2)
p_M2 <-ggplot(data=df1, aes(x=x1, y=y1)) +
geom_bar(stat="identity") p_M2
which results in a second plot as
and similar variables like these... How can I plot this data side by side. Also the way I'hv generated data currently, there is no common y-axis for all x-axis. Are there some suggestion to generate such a plot or dataset in other format to achive the requried plot.
As suggested in the comments, making a factor (class) is the easiest way, allowing you to facet the plot.
But you seem explicitly just to want to have the same y-axis. This is achievable with the scale limits. For example, generate a vector with the limits based on max and then use this in your plots.
ylimits <- c(0, max(c(count_M1, count_M2)))
p_M1 + ylim(ylimits)
p_M2 + ylim(ylimits)

How to plot a stacked area graph for grouped/categorized data in R?

I have a dataset that is categorized on multiple fields. Example:
Time | CatA | CatB | CatC | Value
---------------------------------
1 A X P 4
2 A X Q 6
3 A Y R 3
4 A Y P 7
1 B X Q 8
2 B X R 9
3 B Y P 5
1 A X Q 8
2 A X R 2
3 A Y P 6
4 A Y Q 4
5 A Y R 3
Now I want to plot a stacked area graph, such that for every time as an unit on X-axis I have corresponding data of Y as a combined attribute of the categories. E.g: (A,X,P) will have be one graph stack, (A,X,Q) will be another, (B,X,P) another and so on.
How do I plot this in R? (PS: I'm a novice to R)
Is the type of graph you are interested in?
Your data:
df<-read.table(header = TRUE, text = "Time CatA CatB CatC Value
1 A X P 4
2 A X Q 6
3 A Y R 3
4 A Y P 7
1 B X Q 8
2 B X R 9
3 B Y P 5
1 A X Q 8
2 A X R 2
3 A Y P 6
4 A Y Q 4
5 A Y R 3")
Code to create the plot:
library(ggplot2)
#combine all of the cat to a single label
df$cat<-paste(df$CatA, df$CatB, df$CatC)
ggplot(df, aes(x=cat, y=Value, fill=CatB)) + geom_bar(stat="identity") +
scale_fill_manual(values=c("#669933", "#FFCC66")) +
xlab("Cat: A,B and C")

Colouring lines by using all available permutations of multiple category columns

I want to produce a line plot in R that contains multiple lines in different colours depending on other data in the table. My data looks similar to this:
mydf = data.frame(iteration=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
value=runif(20, min=0, max=1),
category=c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
subcategory=c("x","x","x","x","x","y","y","y","y","y","x","x","x","x","x","y","y","y","y","y"))
iteration value category subcategory
1 1 0.79813537 A x
2 2 0.45196396 A x
3 3 0.28001580 A x
4 4 0.65997486 A x
5 5 0.82217320 A x
6 1 0.33805127 A y
7 2 0.75842241 A y
8 3 0.18502805 A y
9 4 0.75586271 A y
10 5 0.28269372 A y
11 1 0.27585682 B x
12 2 0.45901786 B x
13 3 0.18962731 B x
14 4 0.63682207 B x
15 5 0.89821930 B x
16 1 0.93757079 B y
17 2 0.27272290 B y
18 3 0.20485397 B y
19 4 0.33647649 B y
20 5 0.07788958 B y
Now I would like to print four lines with ggplot into the same plot representing all available combinations of category and subcategory. I also want to have a different colour for each combination. For example: A.x red, A.y green, B.x blue, B.y yellow.
The best I could come up with was two colours and two shapes to distinguish between lines.
ggplot(data=mydf,
aes(x=iteration, y=value, colour=category, shape=subcategory)) +
geom_line() + geom_point()
Is there a way of assigning a colour for each possible permutation of category and subcategory?
Thanks!
(I would like to show some sample images but I don't have enough rep yet.)
Create a new variable holding both category and subcategory and map it to colour.
mydf$group <- paste(mydf$category,mydf$subcategory,sep="_")
ggplot(data=mydf,
aes(x=iteration, y=value, colour=group)) +
geom_line() + geom_point() +
scale_color_manual(values = c("red","green","blue","yellow"))

Plotting a graph by grouping a column of a data frame

I want have the following data frame
Value Phase
22 1
23 1
40 1
19 2
17 2
16 2
12 3
13 3
14 3
9 4
7 4
6 4
I want to see how the sum of value of a particular phase has changed over different phases. The phase column can range from 1 to 5. I want to see from phase 1 to phase 2 to 3 and so on, is there a decrease or increase in the sum of value of that phase. I want to use the base plotting system. How can I plot the graph so that the changes in each phase are made clear.
Here is how to do a line + scatter plot of the sums of Value for each value in Phase. First you need to aggregate the data by Phase. I'm providing both a base R solution (as you requested) and a ggplot solution.
df <- read.table(text = "Value Phase
22 1
23 1
40 1
19 2
17 2
16 2
12 3
13 3
14 3
9 4
7 4
6 4", header = TRUE)
sums <- aggregate(Value ~ Phase, df, sum, na.rm = TRUE)
png("sums.png", height = 540, width = 540)
plot(sums$Phase, sums$Value, xlab = "Phase", ylab = "Sum of Value")
lines(sums$Phase, sums$Value, type = "l")
dev.off()
# ggplot method
require(ggplot2)
ggplot(sums, aes(x = Phase, y = Value)) + geom_point() + geom_line()
ggsave("sums-ggplot.png")

R - Multiple plot with ggplot

I have this small dataset
map red_team blue_team
1 7 8
2 21 32
3 11 22
4 10 8
And I am trying to create a multiplot where each individual plot one represents one of the maps (1,2,3 and 4), and the content is two bars, one for red_team and another for blue_team on the X axis and the score on the Y axis.
This what I currently have.
ggplot(winners_and_score, aes(red_team)) + geom_bar() + facet_wrap(~ map)
I'm having issue trying to display the score for both teams.
Thanks.
require(reshape2)
require(ggplot2)
# toy data
df = data.frame(map = 1:4, red_team = sample(7:21, 4, replace=T),
blue_team = sample(8:32, 4, replace=T))
df.melted <- melt(df, id='map')
> df.melted
map variable value
1 1 red_team 8
2 2 red_team 15
3 3 red_team 17
4 4 red_team 19
5 1 blue_team 22
6 2 blue_team 32
7 3 blue_team 31
8 4 blue_team 18
# making the plot
ggplot(data=df.melted, aes(x=variable, y=value, fill=variable)) +
geom_bar(stat='identity') +
facet_wrap(~map) +
theme_bw()

Resources