I'm trying to plot the table below using a grouped barplot with ggplot2.
How do I plot it in a way such that the scheduled audits and noofemails are plotted sided by side based on each day?
Email Type Sent Month Sent Day Scheduled Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21
ggplot(joined, aes(x=`Sent Day`, y=`Scheduled Audits`, fill = Noofemails )) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()
Does not achieve the plot I hope to see.
Using this data format, so slightly new column names, no more back-ticks. read_table(text = "") is a nice way to share little datasets on Stack
joined <- read.table(text =
"ID Email_Type Sent_Month Sent_Day Scheduled_Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21",
header = TRUE)
This is why ggplot2 really likes long data instead of wide data. Because it needs column names to create the aesthetics.
So you can use the function tidyr::gather() to rearrange the two columns of interest into one with labels and one with values. This increase the number of rows in the data frame, so thats why its called long.
long <- tidyr::gather(joined,"key", "value", Scheduled_Audits, Noofemails)
ggplot(long, aes(Sent_Day, value, fill = key)) +
geom_col(position = "dodge")
Alternatively you can use the melt() function from the reshape package. See example below.
library("ggplot2")
library(reshape2)
joined2 <- melt(joined[,c("Sent_Day", "Noofemails", "Scheduled_Audits")], id="Sent_Day")
ggplot(joined2, aes(x=`Sent_Day`, y= value, group = variable, fill= variable)) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()
I'm trying to create a histogram using ggplot2 in R.
This is the code I'm using:
library(tidyverse)
dat_male$explicit_truncated <- trunc(dat_male$explicit_mean)
means2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), mean, na.rm=TRUE)
colnames(means2) <- c("explicit", "id", "IAT_D")
sd2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), sd, na.rm=TRUE)
length2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), length)
se2 <- sd2$x / sqrt(length$x)
means2$lo <- means2$IAT_D - 1.6*se2
means2$hi <- means2$IAT_D + 1.6*se2
ggplot(data = means2, aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2), position=position_dodge(0.9), data=means2) +
xlab("Explicit attitude score") +
ylab("D-score")
For some reason I get the following warning message:
Removed 3 rows containing missing values (geom_bar).
And I get the following histogram:
I really have no clue what is going on.
Please let me know if you need to see anything else of my code, I'm never really sure what to include.
dat_male is a dataset that looks like this (I have only included the variables that I mentioned in this question, as the dataset contains 68 variables):
id explicit_mean IAT_D explicit_truncated
5 1 3.1250 0.366158652 3
6 1 3.3125 0.373590066 3
9 1 3.6250 0.208096230 3
11 1 3.1250 0.661983618 3
15 1 2.3125 0.348246184 2
19 1 3.7500 0.562406383 3
28 1 2.5625 -0.292888526 2
35 1 4.3750 0.560039531 4
36 1 3.8125 -0.117455439 3
37 1 3.1250 0.074375196 3
46 1 2.5625 0.488265849 2
47 1 4.2500 -0.131005579 4
53 1 2.0625 0.193040876 2
55 1 2.6875 0.875420303 2
62 1 3.8750 0.579146056 3
63 1 3.3125 0.666095380 3
66 1 2.8125 0.115607820 2
68 1 4.3750 0.259929946 4
80 1 3.0000 0.502709149 3
means2 is a dataset I have used to calculate means, and that looks like this:
explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864
Now that I see it front of me, it probably has something to do with the NaN's?
From your dataset it seems like everything is alright.
The errors that you get are an indication that your data.frame has empty values (i.e. NaN and NA).
I actually got two warning messages:
Warning messages:
1: Removed 1 rows containing missing values
(geom_bar).
2: Removed 2 rows containing missing values
(geom_errorbar).
Regarding the plot, because you don't have any zero values under explicit, you don't see it in the graph. Similarly, because you have NAs under lo and hi for one in explicit, you don't get the corresponding error bar.
Dataset:
means2 <- read.table(text = " explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864",
header = TRUE)
plot:
means2 %>%
ggplot(aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2),
position=position_dodge(0.9)) +
xlab("Explicit attitude score") +
ylab("D-score")
I have a table of data which already contain several values to be plotted on a barplot with ggplot2 package (already cumulative data).
The data in the data frame "reserves" has the form (simplified):
period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15
The first column (period) is the geological epoch. It will be on x axis, and I needed to have no extra ordering on it, so I prepared appropriate factor labelling with the command
reserves$period <- factor(reserves$period, levels = reserves$period)
The column "amount" is the main column to be plotted as y axis (it is percentage of hydrocarbons in each epoch, but it could be in absolute values as well, say, millions of tons or whatever). So basic plot is invoked by the command:
ggplot(reserves,aes(x=period,y=amount)) + geom_bar(stat="identity")
But here is the question. I need to plot other values, that is a1-a2, b1-b2, and h1-h4 on the same bar graph. These values are percentage values for each letter (for example, a1=60, then a2=40; the same for b1-b2; and for h1-h4 as well they sum up to 100. So: I need to have values a1-a2 as some color, proportionally dividing the "amount" bar for each value of x (stacked barplot), then I need the same for values b1-b2; so we have for each period two adjacent columns (grouped barplots), each of them is stacked. And next, I need the third column, for values h1-h4, perhaps, also as a stacked barplot, but either as a third column, or as a staggered barplot above the first one.
So the layout looks like this:
I learned that I need first to reshape data with package reshape2, and then use the option position="dodge" or position="fill" in geom_bar(), but here is the combination thereof. And the third barplot (for values h1-h4) seems to need "stacked percent" representation with fixed height.
Are there packages which handle the data for plotting in a more intuitive way? Lets say, we just declare, that we want variables ai,bi, hi to be plotted.
First you should reshape your data from wide to long, then scale your proportions to their raw values. Then split your old column names (now levels of "lett") into their letters and numbers for labeling. If your real data aren't formatted like this (a1...h4) there's ways to handle that as well.
library(dplyr)
library(tidyr)
library(ggplot2)
reserves <- read.csv(text = "period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15")
reserves.tidied <- reserves %>%
gather(key = lett, value = prop, -period, -amount) %>%
mutate(rawvalue = prop * amount/100,
lett1 = substr(lett, 1, 1),
num = substr(lett, 2, 2))
reserves.tidied
period amount lett prop rawvalue lett1 num
1 J 18.1 a1 30 5.430 a 1
2 K 29.0 a1 65 18.850 a 1
3 P 13.3 a1 94 12.502 a 1
4 N 21.6 a1 95 20.520 a 1
5 J 18.1 a2 60 10.860 a 2
6 K 29.0 a2 35 10.150 a 2
7 P 13.3 a2 6 0.798 a 2
8 N 21.6 a2 5 1.080 a 2
9 J 18.1 b1 40 7.240 b 1
10 K 29.0 b1 75 21.750 b 1
11 P 13.3 b1 85 11.305 b 1
12 N 21.6 b1 80 17.280 b 1
13 J 18.1 b2 60 10.860 b 2
14 K 29.0 b2 25 7.250 b 2
15 P 13.3 b2 15 1.995 b 2
16 N 21.6 b2 20 4.320 b 2
17 J 18.1 h1 15 2.715 h 1
18 K 29.0 h1 5 1.450 h 1
19 P 13.3 h1 10 1.330 h 1
20 N 21.6 h1 10 2.160 h 1
21 J 18.1 h2 50 9.050 h 2
22 K 29.0 h2 50 14.500 h 2
23 P 13.3 h2 55 7.315 h 2
24 N 21.6 h2 55 11.880 h 2
25 J 18.1 h3 30 5.430 h 3
26 K 29.0 h3 40 11.600 h 3
27 P 13.3 h3 20 2.660 h 3
28 N 21.6 h3 20 4.320 h 3
29 J 18.1 h4 5 0.905 h 4
30 K 29.0 h4 5 1.450 h 4
31 P 13.3 h4 15 1.995 h 4
32 N 21.6 h4 15 3.240 h 4
Then to plot your tidied data, you want the letters across the x axis, and the rawvalue we just calculated with amount*proportion on the y axis. We stack the geom_col up from 1 to 2 or 1 to 4 (the reverse=T argument overrides the default, which would have 2 or 4 at the bottom of the stack). alpha and fill let us distinguish between groups in the same bar and between bars.
Then the geom_text labels each stacked segment with the name, a newline, and the original percentage, centered on each segment. The scale reverses the default behavior again, making 1 the darkest and 2 or 4 the lightest in each bar. Then you facet across, making one group of bars for each period.
ggplot(reserves.tidied,
aes(x = lett1, y = rawvalue, alpha = num, fill = lett1)) +
geom_col(position = position_stack(reverse = T), colour = "black") +
geom_text(position = position_stack(reverse = T, vjust = .5),
aes(label = paste0(lett, ":\n", prop, "%")), alpha = 1) +
scale_alpha_discrete(range = c(1, .1)) +
facet_grid(~period) +
guides(fill = F, alpha = F)
Rearranging it so that the "h" bars are different from the "a" and "b" bars is a bit more complex, and you'd have to think about how you want it presented, but it's totally doable.
I am trying to make a stacked bar plot with the X axis being time, Y axis being amount, and fill color being certain features.
My data looks something like this:
> base
Number Mut Time Percent
1 117 22:A->G 2 81.81
2 13 24:G->A 2 9.09
3 10 22:A->G 24:G->A 108:G->A 158:G->A 162:G->A 2 6.99
4 1 22:A->G 24:G->A 2 0.69
5 32 24:G->A 3 94.11
6 1 24:G->A 162:G->T 3 2.94
7 1 24:G->A 82:G->T 3 2.94
When I do a stacked bar graph in ggplot using the code:
ggplot(base,aes(x = Time, fill = Mut, y = Percent)) + geom_bar(stat='identity') + theme(legend.key.size = unit(.5, "cm")) + ylab("Number")
I get a graph that looks like this:
http://imgur.com/32XCkTm,yfCAJsx#0
My problem is I want there to be zero values for time = 1 and time = 4.
Something similar to this:
http://imgur.com/32XCkTm,yfCAJsx#1
Is there a way I can do this? Right now I just added 0 values to the data for times 1 and 4 and added my fill feature(Mut) to be one that already showed up in the data:
> base
Reads Mut Time Percent
1 0 22:A->G 1 0.00
2 117 22:A->G 2 81.81
3 13 24:G->A 2 9.09
4 10 22:A->G 24:G->A 108:G->A 158:G->A 162:G->A 2 6.99
5 1 22:A->G 24:G->A 2 0.69
6 32 24:G->A 3 94.11
7 1 24:G->A 162:G->T 3 2.94
8 1 24:G->A 82:G->T 3 2.94
9 0 22:A->G 4 0.00
My problem is I dont want to have to keep searching for a feature (Mut) that is already in the data. is there a way to just have ggplot automatically put x values for time=1 and time =4 with no bar graphs without having to add values to the data? I have been searching for hours and cant find any answers.
Thanks.
Add + scale_x_discrete(limits = 1:4)
I have a data frame which reads as below:
factor bin ret
1 beta 1 -0.026840807
2 beta 2 -0.051610137
3 beta 3 -0.044658901
4 beta 4 -0.053322048
5 beta 5 -0.060173704
6 size 1 -0.047448288
7 size 2 -0.045603776
8 size 3 -0.051804757
9 size 4 -0.047044614
10 size 5 -0.045720971
11 liquidity 1 -0.057657070
12 liquidity 2 -0.053105474
13 liquidity 3 -0.045501401
14 liquidity 4 -0.048572585
15 liquidity 5 -0.032209038
16 nonlinear 1 -0.045752503
17 nonlinear 2 -0.047673201
18 nonlinear 3 -0.051107792
19 nonlinear 4 -0.045364070
20 nonlinear 5 -0.047722148
21 btop 1 -0.004399745
22 btop 2 -0.035082069
23 btop 3 -0.054526058
24 btop 4 -0.063497535
25 btop 5 -0.077123859
I would like to plot a panel of charts which looks similar to this:
The difference is that the chart I would like to create would have the bin as the x- axis, and ret as the y- axis. And charts should be bar plot. Anyone could help me with this question?
FYI: The code for the sample plot I've included is:
print(ggplot(df, aes(date,value)) +ylab('return(bps)') + geom_line() + facet_wrap(~ series,ncol=input$numCol)+theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))
I wonder if minor change to the code could solve my problem.
From you're description i'll assume this is what you're after
print(ggplot(df, aes(bin, ret)) +
ylab('return(bps)') +
geom_bar(stat="identity") +
facet_wrap(~ factor,ncol=2)+
theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))