Grouped barplot side by side - r

I'm trying to plot the table below using a grouped barplot with ggplot2.
How do I plot it in a way such that the scheduled audits and noofemails are plotted sided by side based on each day?
Email Type Sent Month Sent Day Scheduled Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21
ggplot(joined, aes(x=`Sent Day`, y=`Scheduled Audits`, fill = Noofemails )) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()
Does not achieve the plot I hope to see.

Using this data format, so slightly new column names, no more back-ticks. read_table(text = "") is a nice way to share little datasets on Stack
joined <- read.table(text =
"ID Email_Type Sent_Month Sent_Day Scheduled_Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21",
header = TRUE)
This is why ggplot2 really likes long data instead of wide data. Because it needs column names to create the aesthetics.
So you can use the function tidyr::gather() to rearrange the two columns of interest into one with labels and one with values. This increase the number of rows in the data frame, so thats why its called long.
long <- tidyr::gather(joined,"key", "value", Scheduled_Audits, Noofemails)
ggplot(long, aes(Sent_Day, value, fill = key)) +
geom_col(position = "dodge")

Alternatively you can use the melt() function from the reshape package. See example below.
library("ggplot2")
library(reshape2)
joined2 <- melt(joined[,c("Sent_Day", "Noofemails", "Scheduled_Audits")], id="Sent_Day")
ggplot(joined2, aes(x=`Sent_Day`, y= value, group = variable, fill= variable)) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()

Related

Create heatmap with range of colors in a single cell in R

# dataframe
df1 <- df %>%
mutate(valuesrange=cut(values, breaks=c(0,0.05,10,100,1000,2000,3000, max(values, na.rm=T)),
labels=c("0-0.05", "0.05-10", "10-100", "100-1000", "1000-2000", "2000-3000", ">3000"))) %>%
mutate(valuesrange=factor(as.character(valuesrange), levels=rev(levels(valuesrange))))
#Order for X and Y axis labels
df1$objx <- factor(df1$objx, levels=unique(df1$objx))
df1$objy <- factor(df1$objy, levels=unique(df1$objy))
ggplot(data = df1, aes(x=objx, y=objy, fill = valuesrange)) +
geom_tile()+
scale_fill_manual(values=rev(brewer.pal(7, "YlGnBu")), na.value="grey90")
The df1 data looks like this
objy objx values valuesrange
1 1 15 1219 1000-2000
2 1 15 3911 >3000
3 1 15 3224 >3000
4 1 15 14708 >3000
5 1 15 5054 >3000
6 1 15 31499 >3000
7 1 15 1131 1000-2000
8 1 15 4368 >3000
9 1 15 2749 2000-3000
10 1 15 666. 100-1000
11 1 15 1982 1000-2000
I would like to create a heatmap of df1 data with single tick values of x axis and y axis , and the range values as mentioned in above, i need color for every rangevalues , however if use mentioned code i am able to see only one single color as in the image.
Could you please help how to generate multiple color with in signal cell:

Plot boxplots over time using multiple categories

I am sorry for the header I was not so sure how to ask about it.
I have a data frame that looks like this.
Sample=c("A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B")
Treatment=c("twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook",
"twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook")
replicate=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
time=c( 10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20)
points=c(20,40,80,20,60,120, 30,100,55, 28, 45,90, 80,20,100, 40,90,56,20,30,12,3,5,8)
length(points)
Sample Treatment replicate time points
1 A twiter 1 10 20
2 A twiter 2 10 40
3 A twiter 3 10 80
4 B twiter 1 10 20
5 B twiter 2 10 60
6 B twiter 3 10 120
7 A facebook 1 10 30
8 A facebook 2 10 100
9 A facebook 3 10 55
10 B facebook 1 10 28
11 B facebook 2 10 45
12 B facebook 3 10 90
13 A twiter 1 20 80
14 A twiter 2 20 20
15 A twiter 3 20 100
16 B twiter 1 20 40
17 B twiter 2 20 90
18 B twiter 3 20 56
19 A facebook 1 20 20
20 A facebook 2 20 30
21 A facebook 3 20 12
22 B facebook 1 20 3
23 B facebook 2 20 5
24 B facebook 3 20 8
I would like to plot my data using boxplots at each time point.
I would like to have one box plot that shows Sample A with "twiter" Sample A with "facebook"
Sample "B" with "twiter" and Sample B with "facebook" at time point 10 and the same at time point 20.
So far I can do something like this.
ggplot(data,aes(x=time, y=points,color=Sample, fill=Sample, group=interaction(Sample,Treatment)), alpha=0.1) +
geom_boxplot(alpha=0.1) +
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()
But this is wrong I would like to have the sample A, and B from the two different treatments next to each other at each time point to have a look at the differences. I don't want to use facet_wrap. It is a challenge for me. Thank you for your time
Turning my comment into an answer: your issue is that group=interaction(Sample,Treatment) overrides the grouping by the x-axis (time) that would normally be done. To include time in the grouping, add it to the interaction:
ggplot(data,
aes(
x = time,
y = points,
color = Sample,
fill = Sample,
group = interaction(Sample, Treatment, time)
),
alpha = 0.1) +
geom_boxplot(alpha = 0.1) +
geom_point(position = position_dodge(width = 0.75), alpha = 0.2) +
theme_bw()
Of course, the issue remains that there's no way to tell which box goes with which treatment, but I'll leave that to you to address.
Try this:
library(dplyr)
library(ggplot2)
#Plot
data %>%
arrange(Sample) %>%
mutate(Var=paste(Sample,Treatment),
Var=factor(Var,levels = unique(Var),ordered = T)) %>%
ggplot(aes(x=time,
y=points,
color=Var, fill=Var,
group=Var), alpha=0.1) +
geom_boxplot(alpha=0.1)+
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()+
scale_color_manual(values=c('tomato','tomato','cyan3','cyan3'))+
scale_fill_manual(values=c('tomato','tomato','cyan3','cyan3'))
Output:
If you don't mind making time a factor, you can do the following. Note that I turned your data into a data frame named 'dat'.
dat <- data.frame(Sample=c("A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B","A","A", "A", "B","B","B"),
Treatment=c("twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook",
"twiter","twiter","twiter","twiter","twiter","twiter","facebook","facebook","facebook","facebook","facebook","facebook"),
replicate=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
time=c( 10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20),
points=c(20,40,80,20,60,120, 30,100,55, 28, 45,90, 80,20,100, 40,90,56,20,30,12,3,5,8))
dat %>%
mutate(time = factor(time)) %>%
ggplot(aes(x=time, y=points, color=Sample, fill=Sample), alpha=0.1) +
geom_boxplot(alpha=0.1) +
geom_point(position = position_dodge(width=0.75), alpha=0.2)+
theme_bw()

ggplot sorting axis with flipped coordinates and faceted graph

I have a dataset (LDA output) that looks like this.
lda_tt <- tidy(ldaOut)
lda_tt <- lda_tt %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
arrange(topic, -beta)
topic term beta
1 1 council 0.044069733
2 1 report 0.020086205
3 1 budget 0.016918569
4 1 polici 0.01646605
5 1 term 0.015051927
6 1 annual 0.014938797
7 1 control 0.014316583
8 1 audit 0.013637803
9 1 rate 0.012732765
10 1 fund 0.011997421
11 2 debt 0.033760856
12 2 plan 0.030379431
13 2 term 0.02925229
14 2 fiscal 0.021836885
15 2 polici 0.017802904
16 2 mayor 0.015548621
17 2 transpar0.013175692
18 2 relat 0.012997722
19 2 capit 0.012463813
20 2 long 0.011989227
21 2 remain 0.011989227
22 3 parti 0.031795751
23 3 elect 0.029929187
24 3 govern 0.025496098
25 3 mayor 0.023046232
26 3 district0.014588364
27 3 public 0.014471704
28 3 administr0.013596752
29 3 budget 0.011730188
30 3 polit 0.011730188
31 3 seat 0.010563586
32 3 state 0.010563586
33 4 budget 0.037069484
34 4 revenu 0.025043026
35 4 account 0.018459577
36 4 oper 0.01721546
37 4 tax 0.015867667
38 4 debt 0.014416198
39 4 compani 0.013690464
40 4 expenditur0.012135318
41 4 consolid0.011305907
42 4 increas 0.010891202
43 5 invest 0.026534237
44 5 elect 0.023341538
45 5 administr0.022296654
46 5 improv 0.02189031
47 5 develop 0.019162003
48 5 project 0.017826874
49 5 transport0.016375647
50 5 local 0.016317598
51 5 infrastr0.014401978
52 5 servic 0.014111733
I want to create 5 plots by topic with terms ordered by beta. This is the code
lda_tt %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
I get this graph
As you can see, despite the sorting efforts, the terms are not order by beta, as the term "budget", for example, should be the top term in topic 4, and "invest" at the top of topic 5, etc. How can sort the terms within each topic on each graph? There are several questions on stackoverflow about ggplot sorting, but none of these helped me solve the problem.
The link suggested by Tung provides a solution to the problem. It seems that each term needs to be coded as a distinct factor to get proper sorting. We can add " _ " and the topic number to each term (done in lines 2 and 3), but display only the terms without "_" and the topic number (last line of code takes care of that). The following code generates a faceted graph with proper sorting.
lda_tt %>%
mutate(term = factor(paste(term, topic, sep = "_"),
levels = rev(paste(term, topic, sep = "_")))) %>%#convert to factor
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip() +
scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) #remove "_" and topic number

geom_bar removed 3 rows with missing values

I'm trying to create a histogram using ggplot2 in R.
This is the code I'm using:
library(tidyverse)
dat_male$explicit_truncated <- trunc(dat_male$explicit_mean)
means2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), mean, na.rm=TRUE)
colnames(means2) <- c("explicit", "id", "IAT_D")
sd2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), sd, na.rm=TRUE)
length2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), length)
se2 <- sd2$x / sqrt(length$x)
means2$lo <- means2$IAT_D - 1.6*se2
means2$hi <- means2$IAT_D + 1.6*se2
ggplot(data = means2, aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2), position=position_dodge(0.9), data=means2) +
xlab("Explicit attitude score") +
ylab("D-score")
For some reason I get the following warning message:
Removed 3 rows containing missing values (geom_bar).
And I get the following histogram:
I really have no clue what is going on.
Please let me know if you need to see anything else of my code, I'm never really sure what to include.
dat_male is a dataset that looks like this (I have only included the variables that I mentioned in this question, as the dataset contains 68 variables):
id explicit_mean IAT_D explicit_truncated
5 1 3.1250 0.366158652 3
6 1 3.3125 0.373590066 3
9 1 3.6250 0.208096230 3
11 1 3.1250 0.661983618 3
15 1 2.3125 0.348246184 2
19 1 3.7500 0.562406383 3
28 1 2.5625 -0.292888526 2
35 1 4.3750 0.560039531 4
36 1 3.8125 -0.117455439 3
37 1 3.1250 0.074375196 3
46 1 2.5625 0.488265849 2
47 1 4.2500 -0.131005579 4
53 1 2.0625 0.193040876 2
55 1 2.6875 0.875420303 2
62 1 3.8750 0.579146056 3
63 1 3.3125 0.666095380 3
66 1 2.8125 0.115607820 2
68 1 4.3750 0.259929946 4
80 1 3.0000 0.502709149 3
means2 is a dataset I have used to calculate means, and that looks like this:
explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864
Now that I see it front of me, it probably has something to do with the NaN's?
From your dataset it seems like everything is alright.
The errors that you get are an indication that your data.frame has empty values (i.e. NaN and NA).
I actually got two warning messages:
Warning messages:
1: Removed 1 rows containing missing values
(geom_bar).
2: Removed 2 rows containing missing values
(geom_errorbar).
Regarding the plot, because you don't have any zero values under explicit, you don't see it in the graph. Similarly, because you have NAs under lo and hi for one in explicit, you don't get the corresponding error bar.
Dataset:
means2 <- read.table(text = " explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864",
header = TRUE)
plot:
means2 %>%
ggplot(aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2),
position=position_dodge(0.9)) +
xlab("Explicit attitude score") +
ylab("D-score")

How to create a stacked bar chart from summarized data in ggplot2

I'm trying to create a stacked bar graph using ggplot 2. My data in its wide form, looks like this. The numbers in each cell are the frequency of responses.
activity yes no dontknow
Social events 27 3 3
Academic skills workshops 23 5 8
Summer research 22 7 7
Research fellowship 20 6 9
Travel grants 18 8 7
Resume preparation 17 4 12
RAs 14 11 8
Faculty preparation 13 8 11
Job interview skills 11 9 12
Preparation of manuscripts 10 8 14
Courses in other campuses 5 11 15
Teaching fellowships 4 14 16
TAs 3 15 15
Access to labs in other campuses 3 11 18
Interdisciplinary research 2 11 18
Interdepartamental projects 1 12 19
I melted this table using reshape2 and
melted.data(wide.data,id.vars=c("activity"),measure.vars=c("yes","no","dontknow"),variable.name="haveused",value.name="responses")
That's as far as I can get. I want to create a stacked bar chart with activities on the x axis, frequency of responses in the y axis, and each bar showing the distribution of the yes, nos and dontknows
I've tried
ggplot(melted.data,aes(x=activity,y=responses))+geom_bar(aes(fill=haveused))
but I'm afraid that's not the right solution
Any help is much appreciated.
You haven't said what it is that's not right about your solution. But some issues that could be construed as problems, and one possible solution for each, are:
The x axis tick mark labels run into each other. SOLUTION - rotate the tick mark labels;
The order in which the labels (and their corresponding bars) appear are not the same as the order in the original dataframe. SOLUTION - reorder the levels of the factor 'activity';
To position text inside the bars set the vjust parameter in position_stack to 0.5
The following might be a start.
# Load required packages
library(ggplot2)
library(reshape2)
# Read in data
df = read.table(text = "
activity yes no dontknow
Social.events 27 3 3
Academic.skills.workshops 23 5 8
Summer.research 22 7 7
Research.fellowship 20 6 9
Travel.grants 18 8 7
Resume.preparation 17 4 12
RAs 14 11 8
Faculty.preparation 13 8 11
Job.interview.skills 11 9 12
Preparation.of.manuscripts 10 8 14
Courses.in.other.campuses 5 11 15
Teaching.fellowships 4 14 16
TAs 3 15 15
Access.to.labs.in.other.campuses 3 11 18
Interdisciplinay.research 2 11 18
Interdepartamental.projects 1 12 19", header = TRUE, sep = "")
# Melt the data frame
dfm = melt(df, id.vars=c("activity"), measure.vars=c("yes","no","dontknow"),
variable.name="haveused", value.name="responses")
# Reorder the levels of activity
dfm$activity = factor(dfm$activity, levels = df$activity)
# Draw the plot
ggplot(dfm, aes(x = activity, y = responses, group = haveused)) +
geom_col(aes(fill=haveused)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = responses), position = position_stack(vjust = .5), size = 3) # labels inside the bar segments

Resources