geom_histogram with proportions and factor data

geom_histogram with proportions and factor data - r

I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:
dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)
wherein not all possible classes may have been present in the zone.
I can pre-compute the data summary and use geom_bar and a manual colour scale to get consistent bar colours regardless of missing data:
library(dplyr)
library(ggplot2)
library(viridis)
dat_summ <- dat %>%
group_by(CLASS, .drop = FALSE) %>%
summarise(percentage = n() / nrow(.) * 100)
mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]
ggplot(dat_summ) +
geom_bar(aes(x = CLASS, y = percentage, fill = CLASS),
stat = 'identity', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = mancols, drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
But I can't keep the colours consistent across plots when I try to use geom_histogram:
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = ..x..), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_c() +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b() doesn't solve the problem - it always rescales the palette against the number of non-0 columns.
Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar approach?

Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F) is what you want (with fill = CLASS).
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_d(drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

I think that the problem is that you pass the calculated variable ..x.. to fill in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual and you will get the same plot colours regardless of how many levels there are in your CLASS variable:
ggplot(dat) +
geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF"))
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

Related

How to automatically change label color depending on relative values (maximum/minimum)?

In order to make a dynamic visualization, for example in a dashboard, I want to display the label colors (percentages or totals) depending on their real values in black or white.
As you can see from my reprex below, I changed the color of the label with the highest percentage manually to black, in order gain a better visability.
Is there a was, to automatically implement the label color? The label with the highest percentage corresponding should always be black, if data is changing over time.
library(ggplot2)
library(dplyr)
set.seed(3)
reviews <- data.frame(review_star = as.character(sample.int(5,400, replace = TRUE)),
stars = 1)
df <- reviews %>%
group_by(review_star) %>%
count() %>%
ungroup() %>%
mutate(perc = `n` / sum(`n`)) %>%
arrange(perc) %>%
mutate(labels = scales::percent(perc))
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels), color = c( "white", "white","white",1,"white"),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()

you can set the colors using replace(rep('white', nrow(df)), which.max(df$perc), 'black').
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels),
color = replace(rep('white', nrow(df)), which.max(df$perc), 'black'),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()

R Windrose percent label on figure

I am using the windrose function posted here: Wind rose with ggplot (R)?
I need to have the percents on the figure showing on the individual lines (rather than on the left side), but so far I have not been able to figure out how. (see figure below for depiction of goal)
Here is the code that makes the figure:
p.windrose <- ggplot(data = data,
aes(x = dir.binned,y = (..count..)/sum(..count..),
fill = spd.binned)) +
geom_bar()+
scale_y_continuous(breaks = ybreaks.prct,labels=percent)+
ylab("")+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica")
I marked up the figure I have so far with what I am trying to do! It'd be neat if the labels either auto-picked the location with the least wind in that direction, or if it had a tag for the placement so that it could be changed.
I tried using geom_text, but I get an error saying that "aesthetics must be valid data columns".
Thanks for your help!

One of the things you could do is to make an extra data.frame that you use for the labels. Since the data isn't available from your question, I'll illustrate with mock data below:
library(ggplot2)
# Mock data
df <- data.frame(
x = 1:360,
y = runif(360, 0, 0.20)
)
labels <- data.frame(
x = 90,
y = scales::extended_breaks()(range(df$y))
)
ggplot(data = df,
aes(x = as.factor(x), y = y)) +
geom_point() +
geom_text(data = labels,
aes(label = scales::percent(y, 1))) +
scale_x_discrete(breaks = seq(0, 1, length.out = 9) * 360) +
coord_polar() +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())

#teunbrand answer got me very close! I wanted to add the code I used to get everything just right in case anyone in the future has a similar problem.
# Create the labels:
x_location <- pi # x location of the labels
# Get the percentage
T_data <- data %>%
dplyr::group_by(dir.binned) %>%
dplyr::summarise(count= n()) %>%
dplyr::mutate(y = count/sum(count))
labels <- data.frame(x = x_location,
y = scales::extended_breaks()(range(T_data$y)))
# Create figure
p.windrose <- ggplot() +
geom_bar(data = data,
aes(x = dir.binned, y = (..count..)/sum(..count..),
fill = spd.binned))+
geom_text(data = labels,
aes(x=x, y=y, label = scales::percent(y, 1))) +
scale_y_continuous(breaks = waiver(),labels=NULL)+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
ylab("")+xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica") +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())

Bar plot ggplot2 - Error: Aesthetics must be either length 1 or the same as the data (150): fill, x, y

Hey I know there are lots of questions about this particular error but i still cant find what is wrong, pretty new to R and coding in general.
here is a link to may data
https://www.dropbox.com/s/qfo5rp7ywgsgxhy/CRERDATA.csv?dl=0
and here is my code to make the graph
not all used for graph obviously
library(car)
library(ggplot2)
library(Rmisc)
library(dunn.test)
library(FSA)
summarizes my data so i can get standard error bars
index_sum <- summarySE(CRERDATA, measurevar = "index", groupvars = c("site", "scenario"), na.rm = TRUE)
graph code
index_graph <- ggplot(CRERDATA, aes(x = index_sum$site, y = index_sum$index, fill = index_sum$scenario)) +
geom_bar(aes(fill = index_sum$scenario), position = position_dodge(), stat="identity") + ylab("Bleaching index") + xlab("Sites") +
labs(fill = "scenario") + scale_fill_grey() + theme_minimal() +
geom_errorbar(aes(ymin = index_sum$index-se, ymax = index_sum$index+se), width = .2, position = position_dodge(.9), color = "red")

You have specified CRERDATA in the ggplot as data, when you are actually using index_sum.
Load necessary packages:
library(ggplot2)
library(Rmisc)
Read data and summarise:
CRERDATA <- read.csv('CRERDATA.csv')
index_sum <- summarySE(CRERDATA, measurevar = "index", groupvars = c("site", "scenario"), na.rm = TRUE)
Instead of CRERDATA, use index_sum. You then don't need to use the dollar sign $ to access columns:
ggplot(index_sum, aes(x = site, y = index, fill = scenario)) +
geom_bar(aes(fill = scenario), position = position_dodge(), stat="identity") +
ylab("Bleaching index") + xlab("Sites") + labs(fill = "scenario") + scale_fill_grey() +
theme_minimal() +
geom_errorbar(aes(ymin = index-se, ymax = index+se), width = .2, position = position_dodge(.9), color = "red")
The result:

Duplicate items in ggplot2 legend

I have produced a plot with stacked columns and two lines in ggplot2. However, the legend items of the lines also show in the legend of the columns. Any one knows how to remove them from the column legend?
Code below:
##Remove Objects
rm(list=ls(all=TRUE))
##Load packages
library(ggplot2)
library(dplyr)
library(reshape)
##Data
set.seed(12345)
d.fig6.1 <- data.frame(mm=c("Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"),a.1=(rnorm(12)*5)^2)
d.fig6.1$a.2 <- (rnorm(12)*5)^2
d.fig6.1$a.3 <- (rnorm(12)*5)^2
d.fig6.1$a.4 <- (rnorm(12)*5)^2
d.fig6.1$a.5 <- (rnorm(12)*5)^2
d.fig6.1$a.6 <- (rnorm(12)*5)^2
d.fig6.1$a.7 <- (rnorm(12)*5)^2
d.fig6.2 <- data.frame(mm=c("Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"),a.8=(rnorm(12)*5)^2)
d.fig6.2$a.9 <- (rnorm(12)*5)^2
d.fig6.1 <- melt(d.fig6.1,id="mm")
d.fig6.2 <- melt(d.fig6.2,id="mm")
d.fig6.1
d.fig6.2
##Plot
theme_set(theme_bw(7)) #25
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2",
"#D55E00", "#CC79A7","red")
sp.6 <- ggplot(d.fig6.1, aes(x=mm, y=value, fill=variable)) + geom_col()
+ labs(x="") + labs(y="[Units]")
+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
+ scale_fill_manual(values=cbPalette,name="")
+ geom_text(data=d.fig6.1, aes(label = round(value,digits=2)), position = position_stack(vjust=0.5), size=2)
+ theme(legend.title = element_blank())
+ geom_line(data=d.fig6.2, aes(x=as.numeric(mm), y=value, color=variable),size=1,inherit.aes = FALSE)
+ geom_text(data=d.fig6.2, aes(label=round(value,digits=2)),hjust=0, vjust=0, size=2.5)
sp.6

Move fill = variable out of the top level ggplot(aes(...)) mapping. Keep only the aesthetic mappings common to all geoms there. This way you don't really need inherit.aes = FALSE, either:
ggplot(d.fig6.1,
aes(x = mm, y = value, label = round(value, digits = 2))) +
geom_col(aes(fill = variable)) +
geom_text(position = position_stack(vjust = 0.5), size = 2) +
geom_line(data = d.fig6.2,
aes(color = variable, group = variable),
size = 1) +
geom_text(data = d.fig6.2,
hjust = 0, vjust = 0, size = 2.5) +
labs(x = "", y = "[Units]") +
scale_fill_manual(values = cbPalette, name="") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.title = element_blank())
Explanation: By including fill = variable in the top level aesthetic mapping, every geom that doesn't have inherit.aes = FALSE set explicitly will inherit this as part of the aesthetic mapping for its level.
In this case, geom_line & the second geom_text both use a different data source (d.fig6.2 instead of d.fig6.1), but geom_text still inherits fill = variable from the top level mapping. So the variable values from d.fig6.2 ("a.8" & "a.9") are added to the fill palette, even though they aren't used anywhere.
I stripped down your code for a more minimal example that reproduces the problem below:
# this will result in a.8 & a.9 in the column legend
# because geom_text inherits aes(fill = variable) from ggplot()
ggplot(d.fig6.1,
aes(x = mm, y = value, fill = variable,
label = round(value, digits = 2))) +
geom_col() +
geom_text(data = d.fig6.2)
# this will not
# because aes(fill = variable) is moved from ggplot() to geom_col()
ggplot(d.fig6.1,
aes(x = mm, y = value,
label = round(value, digits = 2))) +
geom_col(aes(fill = variable)) +
geom_text(data = d.fig6.2)
# this will not, either
# because geom_text() includes all the aes mappings it requires, & inherit.aes = FALSE
ggplot(d.fig6.1,
aes(x = mm, y = value, fill = variable,
label = round(value, digits = 2))) +
geom_col() +
geom_text(data = d.fig6.2,
aes(x = mm, y = value,
label = round(value, digits = 2)),
inherit.aes = FALSE)

Try adding + guides(color = FALSE) at the end.

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?

I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

geom_histogram with proportions and factor data - r

Related

How to automatically change label color depending on relative values (maximum/minimum)?

R Windrose percent label on figure

Bar plot ggplot2 - Error: Aesthetics must be either length 1 or the same as the data (150): fill, x, y

Duplicate items in ggplot2 legend

Aesthetics must be either length 1 or the same as the data (1): x, y, label

Categories

Resources