R Sort Cleavland Dot Plot by not shown variable - r

I followed this manual (https://afit-r.github.io/cleveland-dot-plots) to create a Cleaveland Dot Plot which I was able to reproduce but I faced the following challenges:
How do I sort my Y-Axis in historical order? The varieties on my y-axis have different release years and although those are not shown in my plot I would like to order them in historical order. Now they are in some wired alphabetic order starting from the back and I don't even know how to change that.
I couldn't manage to show the differences between the plots in percentages (like in the manual), could anyone explain to me that in more detail?
Do you see any possibility of including the same data for another year?
See below for my code and picture:
require(ggplot2)
require(reshape2)
require(dplyr)
require(plotrix)
cleanup = theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(color = "black"))
data19 = read.csv("Harvest_2019_V2.csv", sep = ";")
data19$Experiment_Year <- as.factor(data19$Experiment_Year)
data19$Release_year <- as.factor(data19$Release_year)
Subset2019 = subset(data19, Experiment_Year == 2019)
agHarvest.Weight <- aggregate(Subset2019[, 9], list(Subset2019$Variety,Subset2019$Release_year,Subset2019$Treatment), mean)
agHarvest.Weight$Variety <- agHarvest.Weight$Group.1
agHarvest.Weight$Release_Year <- agHarvest.Weight$Group.2
agHarvest.Weight$Treatment <- agHarvest.Weight$Group.3
agHarvest.Weight$Yield <- agHarvest.Weight$x
right_label <- agHarvest.Weight %>%
group_by(Variety) %>%
arrange(desc(Yield)) %>%
top_n(1)
left_label <- agHarvest.Weight %>%
group_by(Variety) %>%
arrange(desc(Yield)) %>%
slice(2)
ggplot(agHarvest.Weight, aes(Yield, Variety)) +
geom_line(aes(group = Variety)) +
geom_point(aes(color = Treatment), size = 1.5) +
geom_text(data = right_label, aes(color = Treatment, label = round(Yield, 0)),
size = 3, hjust = -.5) +
geom_text(data = left_label, aes(color = Treatment, label = round(Yield, 0)),
size = 3, hjust = 1.5) +
scale_x_continuous(limits = c(2500, 4500)) + cleanup + xlab("Yield, g") +
scale_color_manual(values=c("blue","darkgreen"))

OP. Understandably, you cannot always share data for various reasons. This is why it is always recommended to either use an existing publicly-available dataset or craft your own in order to produce a minimum reproducible example. Fortunately, you're in luck, as I don't mind doing this for you. :)
TL;DR - there are many ways, but simplest method is to use reorder(your_variable, variable_to_sort_by). Note that y axis direction goes "bottom-up" rather than "top-to-bottom" on the plot.
Example Data
df <- data.frame(
Variety=rep(LETTERS[1:5], each=2),
Yield=c(265, 285, 458, 964, 152, 202, 428, 499, 800, 900),
Treatment=rep(c('first','second'), 5),
Year=rep(c(2000, 2001, 2010, 1999, 1998), each=2)
)
> df
Variety Yield Treatment Year
1 A 265 first 2000
2 A 285 second 2000
3 B 458 first 2001
4 B 964 second 2001
5 C 152 first 2010
6 C 202 second 2010
7 D 428 first 1999
8 D 499 second 1999
9 E 800 first 1998
10 E 900 second 1998
Basic Cleveland Dot Plot
p <- ggplot(df, aes(x=Yield, y=Variety)) +
geom_line(aes(group=Variety)) +
geom_point(size=3) +
geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
theme_bw()
p
Sort Variety (Y axis) by Year Column
You should first notice how ggplot2 arranges your axes. The key is to understand that the origin of the plot starts at the bottom left corner. This means that the lowest value for x and y axes will be at the left and bottom, respectively. This is the reason why df$Variety is alphabetical, but "goes up" (from bottom to top). To reverse the y axis, you can just add scale_y_reverse() to your plot code, but that only works for continuous axes. For discrete axes, you can use scale_y_discrete(limits=rev(df$Variety)). You'll see in the following approach we can avoid that.
To sort the y axis by another column, you can use reorder() right with the aes() call. The reorder() function is basically setup as follows:
reorder(columnA, column_to_use_to_sort_columnA)
In this case, you'll want to sort df$Variety by df$Year, so this should become:
reorder(Variety, Year)
...but remember how the y axis "goes up"? If you want the Y axis to be sorted by df$Year and "go down", you can either reverse the axis via scale_y_discrete(limits=rev(df$Variety)), or conveniently just sort by df$Year in reverse using the syntax:
reorder(Variety, -Year)
Putting this together you get this:
p1 <- ggplot(df, aes(x=Yield, y=reorder(Variety, -Year))) +
geom_line(aes(group=Variety)) +
geom_point(size=2) +
geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
theme_bw()
p1
You'll see we have our proper order now, where df$Variety is sorted by ascending df$Year, starting from the top (1999) and going down to the bottom (2010).
Other ways?
There's other ways to do your sorting, but I found this most straightforward. The other fundamentally different approach would be to sort your data frame first, then plot. However, if you do this, be aware that ggplot2 will convert any column with discrete values into a factor first, and the default factor levels are created by sorting the names in alphabetical order. This means that if you sort your data frame first, then plot, you'll still be stuck with alphabetical order. You would need to sort, then discretely convert df$Variety into a factor (and specify the levels), then plot. Something like this works just the same:
df <- dplyr::arrange(df, -Year) # arrange by descending Year
df$Variety <- factor(df$Variety, levels=unique(df$Variety)) # factor and indicate levels
ggplot(df, aes(x=Yield, y=Variety)) +
geom_line(aes(group=Variety)) +
geom_point(size=2) +
geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
theme_bw() +
scale_y_discrete(limits=rev(df$Variety))
Above code gives you the same plot as the method using reorder(Variety, -Year).

Related

Highlight positions without data in facet_wrap ggplot

When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example:
library(tidyverse)
set.seed(43)
site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort()
year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014")
isZero = rbinom(n = 20, size = 1, prob = 0.40)
value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0)
df <- data.frame(site,year,value)
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site)
This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method:
# Tabulate sites vs year, take zero entries
tab <- table(df$site, df$year)
idx <- which(tab == 0, arr.ind = T)
# Build new data.frame
missing <- data.frame(site = rownames(tab)[idx[, "row"]],
year = colnames(tab)[idx[, "col"]],
value = 1,
label = "N.D.") # For 'no data'
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(data = missing, aes(label = label)) +
facet_wrap(~site)
Alternatively, you could also let the facets omit unused x-axis values:
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site, scales = "free_x")

Plotting with ggplot respecting data order (y axis)

I'm new to ggplot2 so there are things I still don't understand, sorry in advance for this.
I'm plotting with ggplot2 but my y axis is not in the right order (i.e., the first column of the file is the last one to be plotted). Here is my code and the graph:
tab[[5]] %>%
gather(col_name, value, -An) %>%
ggplot(aes(factor(An), sort(col_name), fill = value == 1)) +
geom_tile(colour = 'black') +
scale_x_discrete(breaks=tab[[5]]$An, labels=substr(tab[[5]]$An, 3,4)) +
scale_y_discrete(labels=liste_BV$Nom_riviere) +
scale_fill_manual(values = c('white', "thistle"),
name="Disponiblité",
breaks=c("TRUE", "FALSE"),
labels=c("Oui", "Non")) +
xlab("Années") + ylab("Bassins versants") +
labs(fill = "Disponibilité") +
ggtitle(paste0("Données de ", var[[5]], " manquantes")) + theme(plot.title = element_text(hjust= 0.5))
I found other posts explaining how to transform the x axis into factor, which is already done here and I don't think the y axis needs to be.
I tried orderas said in other posts but the plot doesn't work at all.
The sort in front of the col_name allows me to plot the right data with the right names of y, otherwise it was mixed and the data and names didn't match (no idea why as well).
It should just be the other way around as the columns are in the object tab.
Thanks a lot for your help!
As others have mentioned please provide a reproducible question so that someone can give you the best possible answer. Here you can find more on that.
Coming to your question as far as I understand you need to change your Y-axis values to factor with proper levels to see the desired effect. Here is a little demo on that.
library(ggplot2)
library(dplyr)
test_data <- data_frame(col_a = c("a1","a2","a3"), col_b = c(1,3,5))
# this is what you seem to have for now
# x values are continuous and have a "sense" of "order"
# y values are charactors without order
# you have already converted one of the variables into factor need to do the same here as well
# let us say we need 'a1' top and not in the bottom
# we need to convert 'col_a' to a factor with levels = a3 < a2 < a1
test_data %>%
ggplot(data =., aes(col_a,col_b)) +
geom_bar(stat = "identity") +
coord_flip()
# there are multiple ways to convert to a factor type, but this is what came to my mind
test_data$col_a <- factor(c("a1","a2","a3"),levels = rev(c("a1","a2","a3")), ordered = T)
test_data$col_a
[1] a1 a2 a3
Levels: a3 < a2 < a1
# now the plot starts with 'a1' instead of 'a3'
test_data %>%
ggplot(data =., aes(col_a,col_b)) +
geom_bar(stat = "identity") +
coord_flip()

bar chart within 2 group variables using ggplot

The idea was to develop a chart where
1- It is the mix between the two charts here:
The chart is supposed to have for "Branco00" variable group the data showed as in the first chart. For the "Year" variable group, the data is supposed to be showed up as on the second chart, i.e., would be like a "fill" by "Branco00" onto "Year".
I tried:
g <- ggplot(tabDummy19, aes(X, Y, group = Year, fill = Branco00))
g + geom_col()
However, it gets close what I am looking for, however it did not separate by year on different bars but on the same ones, such as here:
Since it seems you intend to unstack the numeric bars and then organize by Year. Consider geom_bar(...), specifiying dodge position, instead of geom_col() and then run grouping variable, Year, in facet_wrap().
To demonstrate below data uses the populations of Brazil's states (as your data seems to include), pulled from Wikipedia's List of Brazilian states by population for 2000, 2010, and 2014. I include a Branco00 variable equal to each other for each state.
Data
txt = 'UF,Year,Population
SãoPaulo,2014,44035304
MinasGerais,2014,20734097
RiodeJaneiro,2014,16461173
Bahia,2014,15126371
RioGrandedoSul,2014,11207274
Paraná,2014,11081692
Pernambuco,2014,9277727
Ceará,2014,8842791
Pará,2014,8073924
Maranhão,2014,6850884
SantaCatarina,2014,6727148
Goiás,2014,6523222
Paraíba,2014,3943885
EspíritoSanto,2014,3885049
Amazonas,2014,3873743
RioGrandedoNorte,2014,3408510
Alagoas,2014,3321730
MatoGrosso,2014,3224357
Piauí,2014,3194718
DistritoFederal,2014,2852372
MatoGrossodoSul,2014,2619657
Sergipe,2014,2219574
Rondônia,2014,1748531
Tocantins,2014,1496880
Acre,2014,790101
Amapá,2014,750912
Roraima,2014,496936
SãoPaulo,2010,41262199
MinasGerais,2010,19597330
RiodeJaneiro,2010,15989929
Bahia,2010,14016906
RioGrandedoSul,2010,10693929
Paraná,2010,10444526
Pernambuco,2010,8796448
Ceará,2010,8452381
Pará,2010,7581051
Maranhão,2010,6574789
SantaCatarina,2010,6248436
Goiás,2010,6003788
Paraíba,2010,3766528
EspíritoSanto,2010,3512672
Amazonas,2010,3483985
RioGrandedoNorte,2010,3168027
Alagoas,2010,3120494
MatoGrosso,2010,3035122
Piauí,2010,3118360
DistritoFederal,2010,2570160
MatoGrossodoSul,2010,2449024
Sergipe,2010,2068017
Rondônia,2010,1562409
Tocantins,2010,1383445
Acre,2010,733559
Amapá,2010,669526
Roraima,2010,450479
SãoPaulo,2000,37032403
MinasGerais,2000,17891494
RiodeJaneiro,2000,14391282
Bahia,2000,13070250
RioGrandedoSul,2000,10187798
Paraná,2000,9569458
Pernambuco,2000,7918344
Ceará,2000,7430661
Pará,2000,6192307
Maranhão,2000,5651475
SantaCatarina,2000,5356360
Goiás,2000,5003228
Paraíba,2000,3443825
EspíritoSanto,2000,3097232
Amazonas,2000,2812557
RioGrandedoNorte,2000,2776782
Alagoas,2000,2822621
MatoGrosso,2000,2504353
Piauí,2000,2843278
DistritoFederal,2000,2051146
MatoGrossodoSul,2000,2078001
Sergipe,2000,1784475
Rondônia,2000,1379787
Tocantins,2000,1157098
Acre,2000,557526
Amapá,2000,477032
Roraima,2000,324397'
# STACK EQUAL-LENGTH BRANCO AND NEGRO DFS
brazil_pop_df <- rbind(transform(read.csv(text=txt, header=TRUE), Branco00 = "Branco"),
transform(read.csv(text=txt, header=TRUE), Branco00 = "Negro"))
Original Output (reproducing OP's similar structure)
library(ggplot2)
ggplot(brazil_pop_df, aes(UF, Population, group = Year, fill = Branco00)) +
geom_col()
Adjusted Output (with scales package to adjustment of Y axis and rotating X labels)
library(ggplot2)
library(scales)
ggplot(brazil_pop_df, aes(UF, Population, fill = Branco00)) +
geom_bar(stat="identity", position="dodge") +
scale_y_continuous(labels = comma, expand = c(0, 0), limits = c(0, 50000000)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_grid(. ~ Year)

ggplot2: create ordered group bar plot - (use reorder)

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)
This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.
The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

Visualizing time frame data expressed in ranges of dates

I would like to visualize the time frame data of my five projects given below. Currently I am using OpenOffice draw application and manually producing the graph shown below. But I am not satisfied. Could you help me to solve the following. Thank you.
1. How can I produce somewhat similar graphs using R (or excel) with better precision in terms of days?
2. Is there a way for better visualization of the data? If so, please let me know how to produce that using R or Excel.
Project Time
------- ------
A Feb 15 – March 1
B March 15 – June 15
C Feb 1 – March 15
D April 10 – May 15
E March 1 – June 30
ggplot2 provides a (reasonably) straightforward way to construct a plot.
First you need to get your data into R. You want your starting and ending dates to be some kind of Date format in R (I have used Date)
library(ggplot2)
library(scales) # for date formatting with ggplot2
DT <- data.frame(Project = LETTERS[1:5],
start = as.Date(ISOdate(2012, c(2,3,2,4,3), c(15,15,1,10) )),
end = as.Date(ISOdate(2012, c(3,5,3,5,6), c(1,15,15,15,30))))
# it is useful to have a numeric version of the Project column (
DT$ProjectN <- as.numeric(DT$Project)
You will also want to calculate where to put the text, I will use `ddply1 from the plyr package
library(plyr)
# find the midpoint date for each project
DTa <- ddply(DT, .(ProjectN, Project), summarize, mid = mean(c(start,end)))
You want to create
rectangles for each project, hence you can use geom_rect
text labels for each midpoint
Here is an example how to build the plot
ggplot(DT) +
geom_rect(aes(colour = Project,ymin = ProjectN - 0.45,
ymax = ProjectN + 0.45, xmin = start, xmax = end)), fill = NA) +
scale_colour_hue(guide = 'none') + # this removes the legend
geom_text(data = DTa, aes(label = Project, y = ProjectN, x = mid,colour = Project), inherit.aes= FALSE) + # now some prettying up to remove text / axis ticks
theme(panel.background = element_blank(),
axis.ticks.y = element_blank(), axis.text.y = element_blank()) + # and add date labels
scale_x_date(labels = date_format('%b %d'),
breaks = sort(unique(c(DT$start,DT$end))))+ # remove axis labels
labs(y = NULL, x = NULL)
You could also check gantt.chart function in plotrix package.
library(plotrix)
?gantt.chart
Here is one implementation
dmY.format<-"%d/%m/%Y"
gantt.info<-list(
labels= c("A","B","C","D","E"),
starts= as.Date(c("15/02/2012", "15/03/2012", "01/02/2012", "10/04/2012","01/03/2012"),
format=dmY.format),
ends= as.Date(c("01/03/2012", "15/06/2012", "15/03/2012", "15/05/2012","30/06/2012"),
format=dmY.format)
)
vgridpos<-as.Date(c("01/01/2012","01/02/2012","01/03/2012","01/04/2012","01/05/2012","01/06/2012","01/07/2012","01/08/2012"),format=dmY.format)
vgridlab<-
c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug")
gantt.chart(gantt.info, xlim= c(as.Date("01/01/2012",format=dmY.format), as.Date("01/08/2012",format=dmY.format)) , main="Projects duration",taskcolors=FALSE, border.col="black",
vgridpos=vgridpos,vgridlab=vgridlab,hgrid=TRUE)
I also tried ggplot2. but mnel was faster than me. Here is my codes
data1 <- as.data.frame(gantt.info)
data1$order <- 1:nrow(data1)
library(ggplot2)
ggplot(data1, aes(xmin = starts, xmax = ends, ymin = order, ymax = order+0.5)) + geom_rect(color="black",fill=FALSE) + theme_bw() + geom_text(aes(x= starts + (ends-starts)/2 ,y=order+0.25, label=labels)) + ylab("Projects") + xlab("Date")

Resources