Formatting of grouped bar chart in ggplot - r

I am currently stuck on formatting a grouped bar chart.
I have a dataframe, which I would like to visualize:
iteration position value
1 1 eEP_SRO 20346
2 1 eEP_drift 22410
3 1 eEP_hole 29626
4 2 eEP_SRO 35884
5 2 eEP_drift 39424
6 2 eEP_hole 51491
7 3 eEP_SRO 51516
8 3 eEP_drift 55523
9 3 eEP_hole 74403
The position should be shown as color and the value should be represented in the height of the bar.
My code is:
fig <- ggplot(df_eEP_Location_plot, aes(fill=position, y=value, x=iteration, order=position)) +
geom_bar(stat="identity")
which gives me this result:
I would like to have a correct y-axis labelling and would also like to sort my bars from largest to smallest (ignoring the iteration number). How can I achieve this?
Thank you very much for your help!

I would recommend using fct_reorder from the forcats package to reorder your iterations along the specified values prior to plotting in ggplot. See the following with the sample data you've provided:
library(ggplot2)
library(forcats)
iteration <- factor(c(1,1,1,2,2,2,3,3,3))
position <- factor(rep(c("eEP_SRO","eEP_drift","eEP_hole")))
value <- c(20346,22410,29626,35884,39424,51491,51516,55523,74403)
df_eEP_Location_plot <- data.frame(iteration, position, value)
df_eEP_Location_plot$iteration <- fct_reorder(df_eEP_Location_plot$iteration,
-df_eEP_Location_plot$value)
fig <- ggplot(df_eEP_Location_plot, aes(y=value, x=iteration, fill=position)) +
geom_bar(stat="identity")
fig

Related

arithmatic operations and labelling in ggplot or R

I have a file that looks like this
2 3 LOGIC:A
2 5 LOGIC:A
3 4 LOGIC:Z
I plotted column 1 on x axis vs column 2 on y with column 3 acting as a legend
ggplot(Data, aes(V1, V2, col = V3)) + geom_point()
However is it possible in ggplot itself to subtract column 2 and column 1 and label the top 10 highest absolute difference rows of this subtraction with column 3 values on each scatter point. I dont want to label the entire dataset. Just the top 10 highest deltas
You can try this (if you original dataframe is Data):
library(dplyr)
library(ggplot2)
Data$sub <- abs(Data$V2 - Data$V1)
Data2<- Data %>%
top_n(10,sub)
ggplot()+ geom_text(data=Data2,aes(V1,V2-0.1,label=V3))+
geom_point(data=Data,aes(V1,V2))
With the library dplyr you can filter the top values of a dataframe.
You can change "0.1" for a better value in your plot

using geom_bar to plot the sum of values by criteria in R

I'm new in R and I am trying to use ggplot to create subsets of bar graph per id all together. Each bar must represent the sum of the values in d column by month-year (which is c column). d has NA values and numeric values as well.
My dataframe, df, is something like this, but it has actually around 10000 rows:
#Example of my data
a=c(1,1,1,1,1,1,1,1,3)
b=c("2007-12-03", "2007-12-10", "2007-12-17", "2007-12-24", "2008-01-07", "2008-01-14", "2008-01-21", "2008-01-28","2008-02-04")
c=c(format(b,"%m-%Y")[1:9])
d=c(NA,NA,NA,NA,NA,4.80, 0.00, 5.04, 3.84)
df=data.frame(a,b,c,d)
df
a b c d
1 1 2007-12-03 12-2007 NA
2 1 2007-12-10 12-2007 NA
3 1 2007-12-17 12-2007 NA
4 1 2007-12-24 12-2007 NA
5 1 2008-01-07 01-2008 NA
6 1 2008-01-14 01-2008 4.80
7 1 2008-01-21 01-2008 0.00
8 1 2008-01-28 01-2008 5.04
9 3 2008-02-04 02-2008 3.84
I tried to do my graph using this:
mplot<-ggplot(df,aes(y=d,x=c))+
geom_bar()+
theme(axis.text.x = element_text(angle=90, vjust=0.5))+
facet_wrap(~ a)
I read from the help of geom_bar():
"geom_bar uses stat_count by default: it counts the number of cases at each x position"
So, I thought it would work like that by I'm having this error:
Error: stat_count() must not be used with a y aesthetic.
For the sample I'm providing, I would like to have the graph for id 1 that shows the months with NA empty and the 01-2008 with 9.84. Then for the second id, I would like to have again the months with NA empty and 02-2008 with 3.84.
I'm also tried to sum the data per month by using aggregate and sum before to plot and then use identity in the stat parameter of geom_bar, but, I'm getting NA in some months and I don't know the reason.
I really aprreciate your help.
You should use geom_col not geom_bar. See the help text:
There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default: it counts the number of cases at each x position. geom_col uses stat_identity: it leaves the data as is.
So your final line of code should be:
ggplot(df, aes(y=d, x=c)) + geom_col() + theme(axis.text.x = element_text(angle=90, vjust=0.5))+facet_wrap(~ a)
Do you want something like this:
mplot = ggplot(df, aes(x = b, y = d))+
geom_bar(stat = "identity", position = "dodge")+
facet_wrap(~ a)
mplot
I am using x = b instead of x = c for now.
No need to use geom_col as suggested by #Jan. Simply use the weight aesthetic instead:
ggplot(iris, aes(Species, weight=Sepal.Width)) + geom_bar() + ggtitle("summed sepal width")

Reordering legend while modifying one particular line for a line chart in ggplot

Let's say I have a simple data frame as shown below:
> A <- data.frame(x=1:10, a=rep(1,10), d=rep(2,10), b=rep(3,10))
> A
x a d b
1 1 1 2 3
2 2 1 2 3
3 3 1 2 3
4 4 1 2 3
5 5 1 2 3
6 6 1 2 3
7 7 1 2 3
8 8 1 2 3
9 9 1 2 3
10 10 1 2 3
I want to plot this with x on the x-axis and the other columns as lines on the y-axis. I want the line representing final column to be a little thicker than the other lines. So I can do this with the following code, which leads to the plot shown below it:
library(ggplot2)
#Plot that creates a thicker line for last column of data.
#However, order of legend is changed to alphabetical order.
p <- ggplot(A, aes(x))
for(i in 2:length(A)){
gg.data <- data.frame(x=A$x, value=A[,i], name=names(A)[i])
if(i==length(A)){
p <- p + geom_line(data=gg.data, aes(y=value, color=name), size=1.1)
} else{
p <- p + geom_line(data=gg.data, aes(y=value, color=name))
}
}
Now the problem with the plot above is that the order of the variables in the legend has changed to align with alphabetical order. I don't want that; instead I want the order to remain a,d,b.
I can keep the order as I wish by using melt and then plotting using the code below, but now I don't see how to increase the size of the line representing the last column in A.
Amelt <- melt(A, id.vars='x')
#Plot that orders legend according to order of columns in data frame.
#However, not sure how to thicken one particular line over the others.
pmelt <- ggplot(Amelt)+geom_line(aes(x=x, y=value, color=variable))
How can I get both things that I want?
Have you tried using scale_fill_discrete(breaks=c("a","d","b")) to specify the legends for the plots.
Please have a look at this link:
http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/
Hope this helps!

How to draw a basic histogram with X and Y axis in R

I want to make a simple histogram which involves two vectors ,
values <- c(1,2,3,4,5,6,7,8)
freq <- c(4,6,4,4,3,2,1,1)
df <- data.frame(values,freq)
Now the data.farame df consists the following values :
values freq
1 4
2 6
3 4
4 4
5 3
6 2
7 1
8 1
Now I want to draw a simple histogram, in which values are on the x axis and freq is on y axis. I am trying to use the hist function, but I am not able to give two variables. How can I make a simple histogram from this data?
using ggplot2:
library(ggplot2)
ggplot(df, aes(x = values, y = freq)) +
geom_bar(stat="identity")
Since you have the frequencies already, what you really want is a bar plot:
barplot(df$freq,names.arg=df$values)
If you've got your heart set on using hist, you should do:
hist(rep(df$values,df$freq))
Please read ?barplot and ?hist for further plotting options.
Also, because I'm somewhat of a zealot, I think the code looks cleaner if you use data.table:
library(data.table)
setDT(df) #convert df to a data.table by reference
df[,barplot(freq,names.arg=values)]
and
df[,hist(rep(values,freq))]

vertical line chart - change line plotting direction to top-down in R

I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.

Resources