Creating grouped bar-plot of multi-column data in R - r

I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution

As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).

I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.

As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()

joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.

Related

ggplot2 for a newbie multiple columns grouped in a bar chart? [duplicate]

I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution
As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).
I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.
As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.

Making multi-line plots in R using ggplot2

I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.
library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)

ggplot facets: show annotated text in selected facets

I want to create a 2 by 2 faceted plot with a vertical line shared by the four facets. However, because the facets on top have the same date information as the facets at the bottom, I only want to have the vline annotated twice: in this case in the two facets at the bottom.
I looked a.o. here, which does not work for me. (In addition I have my doubts whether this is still valid code, today.) I also looked here. I also looked up how to influence the font size in geom_text: according to the help pages this is size. In the case below it doesn't work out well.
This is my code:
library(ggplot2)
library(tidyr)
my_df <- read.table(header = TRUE, text =
"Date AM_PM First_Second Systolic Diastolic Pulse
01/12/2017 AM 1 134 83 68
01/12/2017 PM 1 129 84 76
02/12/2017 AM 1 144 88 56
02/12/2017 AM 2 148 93 65
02/12/2017 PM 1 131 85 59
02/12/2017 PM 2 129 83 58
03/12/2017 AM 1 153 90 62
03/12/2017 AM 2 143 92 59
03/12/2017 PM 1 139 89 56
03/12/2017 PM 2 141 86 56
04/12/2017 AM 1 140 87 58
04/12/2017 AM 2 135 85 55
04/12/2017 PM 1 140 89 67
04/12/2017 PM 2 128 88 69
05/12/2017 AM 1 134 99 67
05/12/2017 AM 2 128 90 63
05/12/2017 PM 1 136 88 63
05/12/2017 PM 2 123 83 61
")
# setting the classes right
my_df$Date <- as.Date(as.character(my_df$Date), format = "%d/%m/%Y")
my_df$First_Second <- as.factor(my_df$First_Second)
# to tidy format
my_df2 <- gather(data = my_df, key = Measure, value = Value,
-c(Date, AM_PM, First_Second), factor_key = TRUE)
# Measures in 1 facet, facets split over AM_PM and First_Second
## add anntotations column for geom_text
my_df2$Annotations <- rep("", 54)
my_df2$Annotations[c(4,6)] <- "Start"
p2 <- ggplot(data = my_df2) +
ggtitle("Blood Pressure and Pulse as a function of AM/PM,\n Repetition, and date") +
geom_line(aes(x = Date, y = Value, col= Measure, group = Measure), size = 1.) +
geom_point(aes(x = Date, y = Value, col= Measure, group = Measure), size= 1.5) +
facet_grid(First_Second ~ AM_PM) +
geom_vline(aes(xintercept = as.Date("2017/12/02")), linetype = "dashed",
colour = "darkgray") +
theme(axis.text.x=element_text(angle = -90))
p2
yields this graph:
This is the basic plot from which I start. Now we try to annotate it.
p2 + annotate(geom="text", x = as.Date("2017/12/02"), y= 110, label="start", size= 3)
yielding this plot:
This plot has the problem that the annotation occurs 4 times, while we only want it in the bottom parts of the graph.
Now we use geom_text which will use the "Annotations" column in our dataframe, in line with this SO Question. Be carefull, the column added to the dataframe must be present when you create "p2", the first time (that is why we added the column supra)
p2 + geom_text(aes(x=as.Date("2017/12/02"), y=100, label = Annotations, size = .6))
yielding this plot:
Yes, we succeeded in getting the annotation only in the bottom two parts of the graph. But the font is too big ( ... and ugly) and when we try to correct it with size, two things are interesting: (1) the font size is not changed (although you would expect that from the help pages) and (2) a legend is added.
I have been clicking around a lot and have been unable to solve this after hours and hours. Any help would be appreciated.

geom_bar labeling for melted data / stacked barplot

I have a problem with drawing stacked barplot with ggplot. My data looks like this:
timeInterval TotalWilling TotalAccepted SimID
1 16 12 Sim1
1 23 23 Sim2
1 63 60 Sim3
1 69 60 Sim4
1 61 60 Sim5
1 60 54 Sim6
2 16 8 Sim1
2 23 21 Sim2
2 63 52 Sim3
2 69 64 Sim4
2 61 45 Sim5
2 60 32 Sim6
3 16 14 Sim1
3 23 11 Sim2
3 63 59 Sim3
3 69 69 Sim4
3 61 28 Sim5
3 60 36 Sim6
I would like to draw a stacked barplot for each simID over a timeInterval, and Willing and Accepted should be stacked. I achieved the barplot with the following simple code:
dat <- read.csv("myDat.csv")
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) +
geom_bar(stat="identity", position = "stack")
I get the following graph:
Here my problem is that I would like to put percentages on each stack. Which means, I want to put percentage as for Willing label: (Willing/(Willing+Accepted)) and for Accepted part, ((Accepted/(Accepted+Willing)) so that I can see how many percent is willing how many is accepted such as 45 on red part of stack to 55 on blue part for each stack. I cannot seem to achieve this kind of labeling.
Any hint is appreciated.
applied from Showing data values on stacked bar chart in ggplot2
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
meltedDat$normvalue <- meltedDat$value
meltedDat$valuestr <- sprintf("%.2f%%", meltedDat$value, meltedDat$normvalue*100)
meltedDat <- ddply(meltedDat, .(timeInterval, SimID), transform, pos = cumsum(normvalue) - (0.5 * normvalue))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) + geom_bar(stat="identity", position = "stack") + geom_text(aes(x=timeInterval, y=pos, label=valuestr), size=2)
also, it looks like you may have some of your variables coded as factors.

Re-ordering the legend items in ggplot2 [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a table with following data
marks cut but xut
1 49 51 67
2 53 47 76
3 54 46 67
4 54 46 56
5 55 45 65
6 55 45 75
7 55 45 45
8 55 45 33
9 55 45 43
10 56 45 53
11 56 45 23
12 56 44 78
13 56 44 45
When I plot the graph I get get the legend as cut but xut , I want the legend to be as xut but and cut i.e. I want to re-order the legends and present them in a manner whichI need
below is the code which I have implemented
install.packages("plyr")
install.packages("ggplot2")
install.packages("reshape2")
library("plyr")
library("reshape2")
library("ggplot2")
data=read.csv("data.csv")
attach(data)
data$marks <- factor(data$marks, levels
= data$marks[order(data$cut)])
c.data=melt(data, id.var="marks")
n.data = ddply(c.data,.(marks), transform, pos = cumsum(value) - 0.5*value)
n.data <- transform(n.data,variable = factor(levels = c("xut", "but", "cut")))
plot = ggplot(d.data, aes(x = marks, y = value)) + geom_bar(stat = "identity",mapping = aes(x = value, fill = variable)) + scale_y_continuous( breaks=seq(0,100, by = 10))+geom_text(aes(label = value, y = pos), size = 3, face="bold", colour="white") + scale_fill_manual(values=c("455555","333333","335566")) + theme(axis.line = element_line(),axis.text.x=element_text(angle=60,hjust=1,colour="white"),axis.text.y=element_text(colour="white"),axis.title.x = element_blank(),axis.title.y = element_blank(),panel.background = element_blank(),axis.ticks=element_blank()) + labs(fill="")+coord_cartesian(ylim=c(0,100)) + theme(legend.position = "bottom", legend.direction = "horizontal")
Reorder the levels of the variable with the groups in the long or melted version of the data. For example, using your data
foo <- read.table(text="marks cut but xut
1 49 51 67
2 53 47 76
3 54 46 67
4 54 46 56
5 55 45 65
6 55 45 75
7 55 45 45
8 55 45 33
9 55 45 43
10 56 45 53
11 56 45 23
12 56 44 78
13 56 44 45", header = TRUE)
Melt it into a suitable format
require(reshape2)
require(ggplot2)
bar <- melt(foo, id = "marks")
> head(bar)
marks variable value
1 1 cut 49
2 2 cut 53
3 3 cut 54
4 4 cut 54
5 5 cut 55
6 6 cut 55
Then set the levels on the variable factor containing the group labels
bar <- transform(bar,
variable = factor(variable, levels = c("xut", "but", "cut")))
Then plot
ggplot(bar) + geom_bar(mapping = aes(x = value, fill = variable))
As you don't show any plotting code I'm guessing what your actual plot code looks like, but as the above shows, at least the ordering is what you want...

Resources