ggplot- misiing values in scale x - r

I made that plot-
enter image description here
and that the code:
**plot <- ggplot(df, aes(mounth, value, fill=year))
plot <- plot + geom_bar(stat = "identity", position = 'dodge')+ facet_grid(. ~ variable)+ geom_text(aes(label = value),size=1.8,vjust = -0.5, position = position_dodge(0.9))+
facet_grid(. ~ variable) + ggtitle("חסות הנוער: כמות השמות לפי דיווח \n שנת 2019 VS שנת 2020")
plot**
but the first month(3) and the last month(9) disappeared from the X axis and I can not get them to appear.
Also does anyone know if there is a possibility that the plots will appear in 2 lines and not in one?
Thank you

Try this on your data. The issue is that you are providing a numeric variable to the x-axis. Instead you can use factor() so that each number can appear in the axis. Here the code (And please next time include your data of a sample of it):
#Code
plot <- ggplot(df, aes(factor(mounth), value, fill=year))
plot <- plot + geom_bar(stat = "identity", position = 'dodge')+
facet_grid(. ~ variable)+
geom_text(aes(label = value),size=1.8,vjust = -0.5, position = position_dodge(0.9))+
facet_grid(. ~ variable) +
ggtitle("חסות הנוער: כמות השמות לפי דיווח \n שנת 2019 VS שנת 2020")

Related

How to add percentages on top of an histogram when data is grouped

This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")

How to add an Asterix (significance) above specific bars in faceted bar graph (ggplot R)

I am trying to plot a bar graph using ggplot. The graph is displaying as I would like but I can't figure out how to add an Asterix "*" above some of the bars to show significance. Whenever I try it wither adds them to all of the bars or completely seems to skew the graph.
I need to have an Asterix only above
Group A: Treatment A and Treatment B;
Group B: Treatment A
Thankyou!!
Treatment <- rep(c("Treatment A","Treatment A","Treatment B","Treatment B"), 3)
Group <- c(rep(c("A. Paired cohort"), 4),rep(c("B. Low cohort"), 4),rep(c("C. Normal cohort"), 4))
Outcome <- rep(c("Outcome P","Outcome D"),6)
Percent <- c(6.7,3.3,22.6,16.1,4.9,2.4,25,15,8.2,4.1,20.8,17)
df <- data.frame(Treatment,Group,Outcome,Percent)
#keep original order, not alphabetised
df$Outcome <- factor(df$Outcome, levels = unique(df$Outcome)
#plot graph
ggplot(df,
aes(x=Outcome, y=Percent)) +
geom_bar(aes(fill=Treatment),stat="identity", position="dodge")+
theme_classic() +
scale_fill_grey() +
xlab("") + ylab("%") +
facet_wrap(~Group) +
ylim(0,40)
One option would be to
Add an indicator variable to your data to indicate signifcance using e.g. dplyr::case_when.
This indicator could then be used in geom_text to conditionally add an asterisk as a label on top of the desired bars. To align the * with bars we have to map Treatment on the group aes and make use of position_dodge(width = .9), where .9 is the default width of a geom_bar/col. Additionally I set vjust=-.1 to put the labels slightly above the bars.
library(ggplot2)
library(dplyr)
df$significant <- dplyr::case_when(
grepl("^A", df$Group) & grepl("(A|B)$", df$Treatment) ~ TRUE,
grepl("^B", df$Group) & grepl("A$", df$Treatment) ~ TRUE,
TRUE ~ FALSE
)
# plot graph
ggplot(df, aes(x = Outcome, y = Percent)) +
geom_col(aes(fill = Treatment), position = "dodge") +
geom_text(aes(label = ifelse(significant, "*", ""), group = Treatment),
position = position_dodge(width = .9), vjust = -.1, size = 20 / .pt) +
theme_classic() +
scale_fill_grey() +
labs(x = "", y = "%") +
facet_wrap(~Group) +
ylim(0, 40)

How to highlight a column in ggplot2

I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()
An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()
You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot

Adding labels to ends of bars in ggplot geom_bar

Here's a bar chart:
ggplot(mtcars) +
geom_bar(aes(x = reorder(factor(cyl), mpg), y = mpg), stat="identity") +
coord_flip()
Should produce this:
I would like to add labels on the end showing the total value of mpg in each bar. For example, 4cyl looks to be around about 290 just from eyeballing. I want to add a label showing the exact number to the bars.
I'd like to experiment and see how they look, so for completeness:
Inside at the top of the bars
Outside the bars along the top
Bonus is I'm able to control whether the labels display vertically or horizontally.
I found this SO post but have struggled to replicate the chosen answer. Here's my attempt:
ggplot(mtcars) +
geom_bar(aes(x = reorder(factor(cyl), mpg), y = mpg), stat="identity") +
coord_flip() +
geom_text(aes(label = mpg))
Which gives an error:
Error: geom_text requires the following missing aesthetics: x, y
How can I add labels to the ends of the bars?
This would do what you need through generating a new data.frame for label plotting. You can customize the location of texts by adjusting nudge_y and angle.
library(dplyr)
tmp <- mtcars %>% group_by(cyl) %>% summarise(tot_mpg = sum(mpg))
tmp$cyl <- factor(tmp$cyl)
ggplot(mtcars) +
geom_bar(aes(x = reorder(factor(cyl), mpg), y = mpg), stat="identity") +
coord_flip() + geom_text(data = tmp, nudge_y = 10, angle = 270,
aes(x = cyl, y = tot_mpg, label = tot_mpg))

ggplot2: plotting error bars for groups without overlap

I wish to show the effect of two pollutants on the same outcome and was happy with the plot when there are no groups. Now when I want to plot the same data for all-year and stratified by season, I either get overlaps of error bars or three separate panels which are not optimal for my need.
Sample data could be accessed from here:
https://drive.google.com/file/d/0B_4NdfcEvU7LV2RrMjVyUmpoSDg/edit?usp=sharing
As an example with the following code I create a plot for all-year:
ally<-subset(df, seas=="allyear")
ggplot(ally,aes(x = set, y = pinc,ymin = lcinc, ymax =ucinc,color=pair,shape=pair)) +
geom_point(position=position_dodge(width=0.5) ,size = 2.5) +
geom_linerange(position=position_dodge(width=0.5), size =0.5) + theme_bw() +
geom_hline(aes(yintercept = 0)) +
labs(colour="Pollutant", shape="Pollutant", y="Percent Increase", x="") +
scale_x_discrete(labels=c(NO2=expression(NO[2]),
NOx=expression(NO[x]),
Coarse= expression(Coarse),
PM25=expression(PM[2.5]),
PM10=expression(PM[10]))) +
theme(plot.title = element_text(size = 12,face="bold" )) +
theme(axis.title=element_text(size="12") ,axis.text=element_text(size=12))
But when I add facet_grid(. ~ seas) I will have three separate panels. How can I display this data for all year and divided by seasons in one panel?
Either color or shape needs to be used to represent season, not pollutant.
Then this should come close to what you want:
library(ggplot2)
ggplot(df, aes(x = set, y = pinc,ymin = lcinc, ymax =ucinc,
color=seas, shape=pair)) +
geom_point(position=position_dodge(width=0.5), size = 2.5) +
geom_linerange(position=position_dodge(width=0.5), size =0.5) + theme_bw() +
geom_hline(aes(yintercept = 0)) +
labs(colour="Season", shape="Pollutant", y="Percent Increase", x="") +
scale_x_discrete(labels=c(NO2=expression(NO[2]),
NOx=expression(NO[x]),
Coarse= expression(Coarse),
PM25=expression(PM[2.5]),
PM10=expression(PM[10]))) +
theme(plot.title = element_text(size = 12,face="bold" )) +
theme(axis.title=element_text(size="12") ,axis.text=element_text(size=12))
I do think that facetting gives you better graphs here --
if you want to focus attention on the comparison between seasons for each pollutant, use this (facet_grid(~pair, labeller=label_both)):
if you want to focus attention on the comparison between pollutants for each season, use this (facet_grid(~seas, labeller=label_both)):

Resources