ggplot ordering a clustered barplot - r

Code to reproduce the issue I have:
library("data.table")
library("ggplot2")
DT<-data.table(team=c("Q1","Q2","Q3"), mon=c(3,5,2), tues=c(4,2,1), weds=c(4,2,5))
DT<-melt(DT,id.vars = "team", measure.name = c("mon","tues","weds"))
chartdata<-DT[,.(team, day=variable, score=value)]
ggplot(chartdata, aes(fill=day, y=score, x=team)) +#reorder(data3$Insurer, if(thisdir=="asc") {value} else {-value}))) +
geom_bar(position="dodge", stat="identity")
This produces a clustered barplot. I need to set the order by Monday's score (descending), but can't see a way of doing this. I have tried:
ggplot(chartdata, aes(fill=day, y=score, x=reorder(team, {-score}))) +
geom_bar(position="dodge", stat="identity")
but this appears to sort the data measured by the totals of Monday - Wedsnesday, not using only Monday as I want.
Is this possible? Many thanks!

You can sort your dataframe before plotting into ggplot2 and fix factor levels of the variable used for x axis:
library(dplyr)
library(ggplot2)
chartdata %>%
arrange(day, -score) %>%
mutate(team = factor(team, unique(team))) %>%
ggplot(aes(x = team, y = score, fill = day))+
geom_col(position = position_dodge())
Is it what you are looking for ?

Related

Mantain order of dataframe for a stacked barplot using ggplot2

Using the following dataframe and ggplot...
sample ="BC04"
df<- data.frame(Name=c("Pseudomonas veronii", "Pseudomonas stutzeri", "Janthinobacterium lividum", "Pseudomonas viridiflava"),
Abundance=c(7.17, 4.72, 3.44, 3.33))
ggplot(data=df, aes(x=sample, y=Abundance, fill=Name)) +
geom_bar(stat="identity")
... creates the following graph
barplot
Altough the "geom_bar(stat="identity")" is set to "identity", it still ignores the order in the dataframe. I would like to get a stack order based on the Abundance percentage (Highest percentage at the top with ascending order)
Earlier, strings passed to ggplot, are evaluated with aes_string (which is now deprecated). Now, we convert the string to symbol and evaluate (!!)
library(ggplot2)
ggplot(data=df, aes(x= !! rlang::sym(sample), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Or another option is .data
ggplot(data=df, aes(x= .data[[sample]]), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Update
By checking the plot, it may be that the OP created a column named 'sample. In that case, we reorder the 'Name' based on the descending order of 'Abundance'
df$sample <- "BC04"
ggplot(data = df, aes(x = sample, y = Abundance,
fill = reorder(Name, desc(Abundance)))) +
geom_bar(stat = 'identity')+
guides(fill = guide_legend(title = "Name"))
-output
Or another option is to convert the 'Name' to factor with levels mentioned as the unique elements of 'Name' (as the data is already arranged in descending order of 'Abundance')
library(dplyr)
df %>%
mutate(Name = factor(Name, levels = unique(Name))) %>%
ggplot(aes(x = sample, y = Abundance, fill = Name)) +
geom_bar(stat = 'identity')

How to make a dual axis in ggplot R

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

Plotting the means in ggplot, without using stat_summary()

In ggplot, I want to compute the means (per group) and plot them as points. I would like to do that with geom_point(), and not stat_summary().
Here are my data.
group = rep(c('a', 'b'), each = 3)
grade = 1:6
df = data.frame(group, grade)
# this does the job
ggplot(df, aes(group, grade)) +
stat_summary(fun.y = 'mean', geom = 'point')
# but this does not
ggplot(df, aes(group, grade)) +
geom_point(stat = 'mean')
What value can take the stat argument above?
Is it possible to compute the means, using geom_point(), without computing a new data frame?
You could do
ggplot(df, aes(group, grade)) +
geom_point(stat = 'summary', fun.y="mean")
But in general its really not a great idea to rely on ggplot to do your data manipulation for you. Just let ggplot take of the plotting. You can use packages like dplyr to help with the summarizing
df %>% group_by(group) %>%
summarize(grade=mean(grade)) %>%
ggplot(aes(group, grade)) +
geom_point()

Order bars by difference between variables

My intention is to plot a barchart, with to variables visible:
"HH_FIN_EX", "ACT_IND_CON_EXP" but having them ordered by the variable diff, in ascending order. diff itself should not be included in chart
library(eurostat)
library(tidyverse)
#getting the data
data1 <- get_eurostat("nama_10_gdp",time_format = "num")
#filtering
data_1_4 <- data1 %>%
filter(time=="2016",
na_item %in% c("B1GQ", "P31_S14_S15", "P41"),
geo %in% c("BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE","UK"),
unit=="CP_MEUR")%>% select(-unit, -time)
#transformations and calculations
data_1_4 <- data_1_4 %>%
spread(na_item, values)%>%
na.omit() %>%
mutate(HH_FIN_EX = P31_S14_S15/B1GQ, ACT_IND_CON_EXP=P41/B1GQ, diff=ACT_IND_CON_EXP-HH_FIN_EX) %>%
gather(na_item, values, 2:7)%>%
filter(na_item %in% c("HH_FIN_EX", "ACT_IND_CON_EXP", "diff"))
#plotting
ggplot(data=data_1_4, aes(x=reorder(geo, values), y=values, fill=na_item))+
geom_bar(stat="identity", position=position_dodge(), colour="black")+
labs(title="", x="Countries", y="As percentage of GDP")
I appreciate any suggestions how to do this, as aes(x=reorder(geo, values[values=="diff"]) results in an error.
First of all, you shouldn't include diff (your result column) when using gather, it complicates things.
Change line gather(na_item, values, 2:7) to gather(na_item, values, 2:6).
You can use this code to calculate difference and order (using dplyr::arange) rows in descending order:
plotData <- data_1_4 %>%
spread(na_item, values) %>%
na.omit() %>%
mutate(HH_FIN_EX = P31_S14_S15 / B1GQ,
ACT_IND_CON_EXP = P41 / B1GQ,
diff = ACT_IND_CON_EXP - HH_FIN_EX) %>%
gather(na_item, values, 2:6) %>%
filter(na_item %in% c("HH_FIN_EX", "ACT_IND_CON_EXP")) %>%
arrange(desc(diff))
And plot it with:
ggplot(plotData, aes(geo, values, fill = na_item))+
geom_bar(stat = "identity", position = "dodge", color = "black") +
labs(x = "Countries",
y = "As percentage of GDP") +
scale_x_discrete(limits = plotData$geo)
You can explicitly figure out the order that you want -- this is stored in country_order below -- and force the factor geo to have its levels in that order. Then run ggplot after filtering out the diff variable. So replace your call to ggplot with the following:
country_order = (data_1_4 %>% filter(na_item == 'diff') %>% arrange(values))$geo
data_1_4$geo = factor(data_1_4$geo, country_order)
ggplot(data=filter(data_1_4, na_item != 'diff'), aes(x=geo, y=values, fill=na_item))+
geom_bar(stat="identity", position=position_dodge(), colour="black")+
labs(title="", x="Countries", y="As percentage of GDP")
Doing this, I get the plot below:
is this what you are looking for?
data_1_4 %>% mutate(Val = fct_reorder(geo, values, .desc = TRUE)) %>%
filter(na_item %in% c("HH_FIN_EX", "ACT_IND_CON_EXP")) %>%
ggplot(aes(x=Val, y=values, fill=na_item)) +
geom_bar(stat="identity", position=position_dodge(), colour="black") +
labs(title="", x="Countries", y="As percentage of GDP")

ggplot fill does not work - no errors [MRE]

the ggplot analysis below is intended show number of survey responses by date. I'd like to color the bars by the three survey administrations (the Admini variable).While there are no errors thrown, the bars do not color.
Can anyone point out how/why my bars are not color-coded? THANKS!
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
ndate1<-as.Date(OSTadminDates$Date,"%m/%d/%y");ndate1
SurvAdmin<-as.factor(OSTadminDates$Admini)
R<-ggplot(data=OSTadminDates,aes(x=ndate1),fill=Admini,group=1) +
geom_bar(stat = "count",width = .5 )
R
Here's a work-around you could use:
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
OSTadminDates$Date<-as.Date(OSTadminDates$Date,"%m/%d/%y")
OSTadminDates$Admini <- factor(OSTadminDates$Admini)
df <- OSTadminDates %>%
group_by(Date, Admini) %>%
summarise(n = n())
ggplot(data = df) +
geom_bar(aes(x = Date, y = n, fill = Admini), stat = "identity")

Resources