Plot boxplots over time using multiple categories - r

I am sorry for the header I was not so sure how to ask about it.
I have a data frame that looks like this.
Sample=c("A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B")
Treatment=c("twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook",
"twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook")
replicate=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
time=c( 10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20)
points=c(20,40,80,20,60,120, 30,100,55, 28, 45,90, 80,20,100, 40,90,56,20,30,12,3,5,8)
length(points)
Sample Treatment replicate time points
1 A twiter 1 10 20
2 A twiter 2 10 40
3 A twiter 3 10 80
4 B twiter 1 10 20
5 B twiter 2 10 60
6 B twiter 3 10 120
7 A facebook 1 10 30
8 A facebook 2 10 100
9 A facebook 3 10 55
10 B facebook 1 10 28
11 B facebook 2 10 45
12 B facebook 3 10 90
13 A twiter 1 20 80
14 A twiter 2 20 20
15 A twiter 3 20 100
16 B twiter 1 20 40
17 B twiter 2 20 90
18 B twiter 3 20 56
19 A facebook 1 20 20
20 A facebook 2 20 30
21 A facebook 3 20 12
22 B facebook 1 20 3
23 B facebook 2 20 5
24 B facebook 3 20 8
I would like to plot my data using boxplots at each time point.
I would like to have one box plot that shows Sample A with "twiter" Sample A with "facebook"
Sample "B" with "twiter" and Sample B with "facebook" at time point 10 and the same at time point 20.
So far I can do something like this.
ggplot(data,aes(x=time, y=points,color=Sample, fill=Sample, group=interaction(Sample,Treatment)), alpha=0.1) +
geom_boxplot(alpha=0.1) +
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()
But this is wrong I would like to have the sample A, and B from the two different treatments next to each other at each time point to have a look at the differences. I don't want to use facet_wrap. It is a challenge for me. Thank you for your time

Turning my comment into an answer: your issue is that group=interaction(Sample,Treatment) overrides the grouping by the x-axis (time) that would normally be done. To include time in the grouping, add it to the interaction:
ggplot(data,
aes(
x = time,
y = points,
color = Sample,
fill = Sample,
group = interaction(Sample, Treatment, time)
),
alpha = 0.1) +
geom_boxplot(alpha = 0.1) +
geom_point(position = position_dodge(width = 0.75), alpha = 0.2) +
theme_bw()
Of course, the issue remains that there's no way to tell which box goes with which treatment, but I'll leave that to you to address.

Try this:
library(dplyr)
library(ggplot2)
#Plot
data %>%
arrange(Sample) %>%
mutate(Var=paste(Sample,Treatment),
Var=factor(Var,levels = unique(Var),ordered = T)) %>%
ggplot(aes(x=time,
y=points,
color=Var, fill=Var,
group=Var), alpha=0.1) +
geom_boxplot(alpha=0.1)+
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()+
scale_color_manual(values=c('tomato','tomato','cyan3','cyan3'))+
scale_fill_manual(values=c('tomato','tomato','cyan3','cyan3'))
Output:

If you don't mind making time a factor, you can do the following. Note that I turned your data into a data frame named 'dat'.
dat <- data.frame(Sample=c("A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B"),
Treatment=c("twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook",
"twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook"),
replicate=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
time=c( 10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20),
points=c(20,40,80,20,60,120, 30,100,55, 28, 45,90, 80,20,100, 40,90,56,20,30,12,3,5,8))
dat %>%
mutate(time = factor(time)) %>%
ggplot(aes(x=time, y=points, color=Sample, fill=Sample), alpha=0.1) +
geom_boxplot(alpha=0.1) +
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()

Related

Arranging the stacks in a stacked bargraph according to the value of one variable

I am trying to write a function that outputs a stacked bar graph at the end, where the stacked bar graph has its' bars ordered going from the greatest percentage to the smallest percentage of one specific variable. I have not been able to find a general way to do this and my ultimate goal is to have this process done in a way where it requires the least amount of human input.
My data looks like this
Swimming_style Comfort_level_label Comfort_level_scale n Total_n Percentage
Front Crawl Excellent 3 7 10 70
Front Crawl Good 2 3 10 30
Backstroke Excellent 3 4 10 40
Backstroke Good 2 4 10 40
Backstroke Fair 1 1 10 10
Backstroke Poor 0 1 10 10
Brest stroke Excellent 3 6 10 60
Brest stroke Fair 1 4 10 40
Butterfly Good 2 7 10 70
Butterfly Fair 1 1 10 10
Butterfly Poor 0 2 10 20
So far, this is my code:
data <- arrange(data, Comfort_level_label, (Percentage))
data$Swimming_style <- factor(data$Swimming_style, levels = unique(data$Swimming_style))
ggplot(data, aes( x = Swimming_style, y = n, fill = Comfort_level_label)) +
geom_bar(position = "fill",stat = "identity") +
scale_y_continuous(labels = scales::percent_format())+
coord_flip()
Which outputs this:
But what I need the graph to do is sort by the Excellent rating from most Excellent on top to least or no Excellent on the bottom, and I'm having trouble doing exactly that.
It can be a little ambiguous, but the important thing when prioritizing is to ensure you always have exactly one of each of the factors.
library(dplyr)
SS <- dat %>%
arrange(-Comfort_level_scale, -n) %>%
group_by(Swimming_style) %>%
slice(1) %>%
ungroup() %>%
arrange(Comfort_level_scale, n) %>%
pull(Swimming_style)
library(ggplot2)
dat %>%
mutate(Swimming_style = factor(Swimming_style, levels = SS)) %>%
ggplot(aes( x = Swimming_style, y = n, fill = Comfort_level_label)) +
geom_bar(position = "fill",stat = "identity") +
scale_y_continuous(labels = scales::percent_format()) +
coord_flip()
BTW: should Brest stroke be Breast stroke?
library(dplyr)
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Swimming_style Comfort_level_label Comfort_level_scale n Total_n Percentage
Front_Crawl Excellent 3 7 10 70
Front_Crawl Good 2 3 10 30
Backstroke Excellent 3 4 10 40
Backstroke Good 2 4 10 40
Backstroke Fair 1 1 10 10
Backstroke Poor 0 1 10 10
Brest_stroke Excellent 3 6 10 60
Brest_stroke Fair 1 4 10 40
Butterfly Good 2 7 10 70
Butterfly Fair 1 1 10 10
Butterfly Poor 0 2 10 20") %>%
mutate(Swimming_style = gsub("_", " ", Swimming_style))

ggplot sorting axis with flipped coordinates and faceted graph

I have a dataset (LDA output) that looks like this.
lda_tt <- tidy(ldaOut)
lda_tt <- lda_tt %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
arrange(topic, -beta)
topic term beta
1 1 council 0.044069733
2 1 report 0.020086205
3 1 budget 0.016918569
4 1 polici 0.01646605
5 1 term 0.015051927
6 1 annual 0.014938797
7 1 control 0.014316583
8 1 audit 0.013637803
9 1 rate 0.012732765
10 1 fund 0.011997421
11 2 debt 0.033760856
12 2 plan 0.030379431
13 2 term 0.02925229
14 2 fiscal 0.021836885
15 2 polici 0.017802904
16 2 mayor 0.015548621
17 2 transpar0.013175692
18 2 relat 0.012997722
19 2 capit 0.012463813
20 2 long 0.011989227
21 2 remain 0.011989227
22 3 parti 0.031795751
23 3 elect 0.029929187
24 3 govern 0.025496098
25 3 mayor 0.023046232
26 3 district0.014588364
27 3 public 0.014471704
28 3 administr0.013596752
29 3 budget 0.011730188
30 3 polit 0.011730188
31 3 seat 0.010563586
32 3 state 0.010563586
33 4 budget 0.037069484
34 4 revenu 0.025043026
35 4 account 0.018459577
36 4 oper 0.01721546
37 4 tax 0.015867667
38 4 debt 0.014416198
39 4 compani 0.013690464
40 4 expenditur0.012135318
41 4 consolid0.011305907
42 4 increas 0.010891202
43 5 invest 0.026534237
44 5 elect 0.023341538
45 5 administr0.022296654
46 5 improv 0.02189031
47 5 develop 0.019162003
48 5 project 0.017826874
49 5 transport0.016375647
50 5 local 0.016317598
51 5 infrastr0.014401978
52 5 servic 0.014111733
I want to create 5 plots by topic with terms ordered by beta. This is the code
lda_tt %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
I get this graph
As you can see, despite the sorting efforts, the terms are not order by beta, as the term "budget", for example, should be the top term in topic 4, and "invest" at the top of topic 5, etc. How can sort the terms within each topic on each graph? There are several questions on stackoverflow about ggplot sorting, but none of these helped me solve the problem.
The link suggested by Tung provides a solution to the problem. It seems that each term needs to be coded as a distinct factor to get proper sorting. We can add " _ " and the topic number to each term (done in lines 2 and 3), but display only the terms without "_" and the topic number (last line of code takes care of that). The following code generates a faceted graph with proper sorting.
lda_tt %>%
mutate(term = factor(paste(term, topic, sep = "_"),
levels = rev(paste(term, topic, sep = "_")))) %>%#convert to factor
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip() +
scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) #remove "_" and topic number

Grouped barplot side by side

I'm trying to plot the table below using a grouped barplot with ggplot2.
How do I plot it in a way such that the scheduled audits and noofemails are plotted sided by side based on each day?
Email Type Sent Month Sent Day Scheduled Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21
ggplot(joined, aes(x=`Sent Day`, y=`Scheduled Audits`, fill = Noofemails )) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()
Does not achieve the plot I hope to see.
Using this data format, so slightly new column names, no more back-ticks. read_table(text = "") is a nice way to share little datasets on Stack
joined <- read.table(text =
"ID Email_Type Sent_Month Sent_Day Scheduled_Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21",
header = TRUE)
This is why ggplot2 really likes long data instead of wide data. Because it needs column names to create the aesthetics.
So you can use the function tidyr::gather() to rearrange the two columns of interest into one with labels and one with values. This increase the number of rows in the data frame, so thats why its called long.
long <- tidyr::gather(joined,"key", "value", Scheduled_Audits, Noofemails)
ggplot(long, aes(Sent_Day, value, fill = key)) +
geom_col(position = "dodge")
Alternatively you can use the melt() function from the reshape package. See example below.
library("ggplot2")
library(reshape2)
joined2 <- melt(joined[,c("Sent_Day", "Noofemails", "Scheduled_Audits")], id="Sent_Day")
ggplot(joined2, aes(x=`Sent_Day`, y= value, group = variable, fill= variable)) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()

R scatter plot by shape, colour and fill

I'm very new to R and I'm trying to build a scatter plot that codes my data according to shape, colour and fill.I want 5 different colours, 3 different shapes, and these to be either filled or not filled (in an non filled point, I would still want the shape and the colour).
My data looks basically like this:
blank.test <- read.table(header=T, text="Colour Shape Fill X13C X15N
1 B B A 16 10
2 D A A 16 12
3 E A B 17 14
4 C A A 14 18
5 A A B 13 18
6 C B B 18 13
7 E C B 10 12
8 E A B 11 10
9 A C B 14 13
10 B A A 11 14
11 C B A 11 10
12 E B A 11 19
13 A B A 10 18
14 A C B 17 16
15 E B A 16 13
16 A C A 16 14")
If I do this:
ggplot(blank.test, aes(x=X13C, y=X15N,size=5)) +
geom_point(aes(shape=Shape,fill=Fill,color=Colour))
I get no filled or unfilled data points
I did a little a little research and it looked like the problem was with the symbols themselves, which cannot take different settings for line and fill; it was recommended I used shapes pch between 21 and 25
But if I do this:
ggplot(blank.test, aes(x=X13C, y=X15N,color=(Colour), shape=(Shape),fill=(Fill),size=5)) +
geom_point() + scale_shape_manual(values=c(21,22,25))`
I still don't get what I want
I also tried playing around with scale_fill_manual without any good result.
I don't think you can use fill for points. What I would do is create an interaction between fill and shape and use this new factor to define your shape and fill/open symbols
blank.test$inter <- with(blank.test, interaction(Shape, Fill))
and then for your plot I would use something like that
ggplot(blank.test, aes(x=X13C, y=X15N)) +
geom_point(aes(shape=inter,color=Colour)) + scale_shape_manual(name="shape", values=c(0,15,1, 16, 2, 17)) + scale_color_manual(name="colour", values=c("red","blue","yellow", "green", "purple"))
I can get the plot to work just fine, but the legend seems to absolutely insist on being black for fill. I can't figure out why. Maybe someone else has the answer to that one.
The 5 being on the legend is cause by having it inside the aes, where only elements that change with your data belong.
Here is some example code:
ggplot(blank.test, aes(x = X13C, y = X15N, color = Colour, shape = Shape, fill = Fill)) +
geom_point(size = 5, stroke = 3) +
scale_shape_manual(values=c(21,22,25)) +
scale_color_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set1") +
theme_bw()

use ggplot to plot a panel of bar plots

I have a data frame which reads as below:
factor bin ret
1 beta 1 -0.026840807
2 beta 2 -0.051610137
3 beta 3 -0.044658901
4 beta 4 -0.053322048
5 beta 5 -0.060173704
6 size 1 -0.047448288
7 size 2 -0.045603776
8 size 3 -0.051804757
9 size 4 -0.047044614
10 size 5 -0.045720971
11 liquidity 1 -0.057657070
12 liquidity 2 -0.053105474
13 liquidity 3 -0.045501401
14 liquidity 4 -0.048572585
15 liquidity 5 -0.032209038
16 nonlinear 1 -0.045752503
17 nonlinear 2 -0.047673201
18 nonlinear 3 -0.051107792
19 nonlinear 4 -0.045364070
20 nonlinear 5 -0.047722148
21 btop 1 -0.004399745
22 btop 2 -0.035082069
23 btop 3 -0.054526058
24 btop 4 -0.063497535
25 btop 5 -0.077123859
I would like to plot a panel of charts which looks similar to this:
The difference is that the chart I would like to create would have the bin as the x- axis, and ret as the y- axis. And charts should be bar plot. Anyone could help me with this question?
FYI: The code for the sample plot I've included is:
print(ggplot(df, aes(date,value)) +ylab('return(bps)') + geom_line() + facet_wrap(~ series,ncol=input$numCol)+theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))
I wonder if minor change to the code could solve my problem.
From you're description i'll assume this is what you're after
print(ggplot(df, aes(bin, ret)) +
ylab('return(bps)') +
geom_bar(stat="identity") +
facet_wrap(~ factor,ncol=2)+
theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))

Resources