I have a problem with drawing stacked barplot with ggplot. My data looks like this:
timeInterval TotalWilling TotalAccepted SimID
1 16 12 Sim1
1 23 23 Sim2
1 63 60 Sim3
1 69 60 Sim4
1 61 60 Sim5
1 60 54 Sim6
2 16 8 Sim1
2 23 21 Sim2
2 63 52 Sim3
2 69 64 Sim4
2 61 45 Sim5
2 60 32 Sim6
3 16 14 Sim1
3 23 11 Sim2
3 63 59 Sim3
3 69 69 Sim4
3 61 28 Sim5
3 60 36 Sim6
I would like to draw a stacked barplot for each simID over a timeInterval, and Willing and Accepted should be stacked. I achieved the barplot with the following simple code:
dat <- read.csv("myDat.csv")
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) +
geom_bar(stat="identity", position = "stack")
I get the following graph:
Here my problem is that I would like to put percentages on each stack. Which means, I want to put percentage as for Willing label: (Willing/(Willing+Accepted)) and for Accepted part, ((Accepted/(Accepted+Willing)) so that I can see how many percent is willing how many is accepted such as 45 on red part of stack to 55 on blue part for each stack. I cannot seem to achieve this kind of labeling.
Any hint is appreciated.
applied from Showing data values on stacked bar chart in ggplot2
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
meltedDat$normvalue <- meltedDat$value
meltedDat$valuestr <- sprintf("%.2f%%", meltedDat$value, meltedDat$normvalue*100)
meltedDat <- ddply(meltedDat, .(timeInterval, SimID), transform, pos = cumsum(normvalue) - (0.5 * normvalue))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) + geom_bar(stat="identity", position = "stack") + geom_text(aes(x=timeInterval, y=pos, label=valuestr), size=2)
also, it looks like you may have some of your variables coded as factors.
Related
here is the data example:
S P C P_int C_int
10 20 164 72 64
20 550 709 92 89
30 142 192 97 96
40 45 61 99 98
50 12 20 99 99
60 5 6 99 99
70 2 2 99 99
80 4 1 99 99
90 1 0 10 99
100 0 1 10 99
Let's say i have a dataframe called df, the aim is to have a bar chart using variables P and C, with an line chart overlayed using sum of variables P_int and C_int. Currently I have these lines of codes to create the bar chart:
final <- df %>% tidyr::gather(type, value, c(`P`, `C`))
ggplot(final, aes(S))+
geom_bar(aes(y=value, fill=type), stat="identity", position="dodge")
The thing I can't figure out is hot to plot the sum of variables P_int and C_int as a line chart overlayed on the above plot with a second Y axis. Would appreciate any help.
Do you need something like this ?
library(ggplot2)
library(dplyr)
ggplot(final, aes(S))+
geom_bar(aes(y=value, fill=type), stat="identity", position="dodge") +
geom_line(data = final %>%
group_by(S) %>%
summarise(total = sum(P_int + C_int)),
aes(y = total), color = 'blue') +
scale_y_continuous(sec.axis = sec_axis(~./1)) +
theme_classic()
I have kept the scale of secondary y-axis same as primary y-axis since they are in the same range but you might need to adjust it in according to your real data.
I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution
As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).
I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.
As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.
I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.
library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)
I am wondering what is the idiomatic approach for constructing a ggplot2 boxplot given precomputated values for each boxplot
> df
base p10 p90 lower_quartile mean median upper_quartile
1 1 32 35 33 33.63740 34 34
2 2 32 35 33 33.77753 34 35
3 3 32 36 33 33.89361 34 35
4 4 33 36 33 33.89691 34 35
5 5 32 35 33 33.85145 34 35
6 6 35 37 37 36.48259 37 37
Attempting to draw these plots with
ggplot(df, aes(base)) +
geom_boxplot(aes(ymin = p10,
lower = lower_quartile,
middle = median,
upper = upper_quartile,
ymax = p90),
stat = "identity")
does not give the desired plots. What am I missing?
I don't know what base represents in your data.frame but in order to do it correctly your x-axis is supposed to be discrete (to show the different boxplots). Then for the y axis you need a ymin, a lower, a middle, an upper and a ymax which you have provided. The x-axis is the variable that is used to plot the different boxplots. So, if you turn it into a factor then it works:
library(ggplot2)
#I have added base as factor
ggplot(df, aes(factor(base))) +
geom_boxplot(aes(ymin = p10,
lower = lower_quartile,
middle = median,
upper = upper_quartile,
ymax = p90),
stat = "identity")
Output:
And this way it works.
I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution
As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).
I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.
As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.