Stacked Bargraph showing Percentage - r

Currently i have plotted a stacked bargraph to display the amount of people having delays in each month with the following code
q1%>%
mutate(DepDelay1 =factor(x=DepDelay1, levels=c(0,1), labels=c("No","Yes"))) %>%
filter(Year==2005)%>%
ggplot(aes(x=factor(Month), fill=DepDelay1)) +
geom_bar(position="stack") +
ggtitle("DepDelay of Flight") +
guides(fill=guide_legend("Delay")) +
xlab ("Month")+
ylab ("Flight Count")+
geom_text(aes(label=..count..),
stat='count',
colour = "white",
size = 2.5,
position = position_stack(vjust = 0.5))
Graph Example for position="stack"
However i cannot seem to produce a 100% stacked bargraph after changing from
geom_bar(position="stack")
to
geom_bar(position="fill")
Graph Example for position="fill"
This method works before i adjust the geom_text and adding the filter to my current code.
Anyone know what's the issue ? From what i understand, a Y-variable is needed however i don't have a suitable value to input it from the dataset and i'm trying to plot the count per month.

Related

How to remove or not show any of the data points above and below the error bar in boxplot and violin plot?

I'm working on a very large dataset containing around 1.6M data points. I'm using the violin plot along with the boxplot to represent the data from each category (there are multiple categories and each has its own set of values).
But the problem which I'm facing is, there are a lot of data points (outliers) above the error bar because of that the focus of the plot has been lost.
Earlier I thought that probably if I remove all the data points after a specific value it will help me to represent what I wanted to show. But It didn't work because for each category the errorbar range is different and because of that, I lost the majority of data from other categories.
So, now I'm thinking to remove or not showing the data points above the error bar for each category individually, for both box and violin plot. And I introduced outlier.shape=NA in the geom_boxplot, it worked fine for the boxplot. Similarly, I wanted to remove all those data points from the violin plot as well which are above the error bar in the boxplot.
Here are the plots before and after using outlier.shape=NA.
Before:
After:
Here is my code :
med_violin <- data %>%
left_join(sample_size) %>%
mutate(myaxis = fct_reorder(paste0(Country), Diff, .fun='median')) %>%
ggplot( aes(x=myaxis, y=Diff, fill=Country)) +
geom_violin(width=1.5, color = "black", position = position_dodge(width=1.8), trim = TRUE) +
geom_boxplot(width=0.2, color="white", alpha=0.01, outlier.colour="red", outlier.size=0.1, outlier.shape = NA) +
scale_y_continuous(breaks = c(0,25,50,75,100,125,150,525,550))+
coord_trans(y = squash_axis(150, 525, 15)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
theme(axis.text.x = element_text(size = 8))+
theme(legend.position ="none")+
scale_fill_viridis(discrete = TRUE) +
xlab("")
med_violin
How can I implement the same thing in genom_violin, so that it will also not show the data points above the error bar?
I even tried this : Ignore outliers in ggplot2 geom_violin
But did not work for me.
Thank you.

How to adjust ggplot secondary y-axis to change position of bar plot layer of mixed plot?

I have created a mixed plot (2 lines and 1 bar plot) using GGPlot. The bars represent the percent difference between the two lines on the plot. It looks good except for the positioning of the bars relative to the lines. I would like to be able to adjust the secondary y-axis (i.e. the bar plot axis) to range from -50 to 50 (i.e. centre at 0) without affecting the primary y-axis, which I would like to have start at 0. Does anyone know a way to do this? The current code that I am working on and the resulting plot look like this:
year<-c(1995:2018)
value1<-c(1116.87,3030.85,2676.40,3809.81,2459.74,2666.61,2678.28,3303.45,1839.21,3567.79,4529.22,5838.21,7762.13,8079.70,9615.55,9645.06,8297.23,8974.69,12757.06,13052.86,13670.74,17598.57,17190.01,20192.92)
value2<-c(998.22,3551.52,2421.50,3647.22,2085.44,2558.46,2863.98,3332.18,1606.40,3445.12,4893.11,5486.48,7242.37,7356.78,8810.64,7787.83,7507.25,8442.26,10347.11,11002.82,8783.90,15604.60,14648.09,15368.58)
df1<-data.frame(year,value1,value2)
df1$diff<-((df1$value1-df1$value2)/df1$value1)*100
library(ggplot2)
plot1<-ggplot(df1, aes(x = year)) +
geom_bar(aes(x=year, y = (diff*200), fill="percent diff"), stat="identity") +
geom_line(aes(y=value1,color="A")) +
geom_line(aes(y=value2, color="B")) +
scale_y_continuous(sec.axis = sec_axis(~./200, name = "percent diff")) +
ylab("Value") +
theme(panel.grid.minor = element_blank(),legend.title = element_blank(), legend.spacing.y = unit(-0.1, "cm")) +
scale_color_manual(" ",values=c("A"="#619CFF", "B"="#F8766D", "percent diff"=NA)) +
scale_fill_manual(" ",values=c("percent diff"="gray"))
plot1
To be clear, the below plot (done in Excel) is what I am trying to accomplish.

Visualizing just top portion of stacked barplot in ggplot2 in R

I made a stacked barplot in ggplot2 in R:
ggplot(Count_dataframe_melt, aes(x = as.factor(variable), y = value, fill = fill)) +
geom_bar(stat = "identity",position="fill")+ scale_y_continuous(name = "Y-axis",labels = scales::percent)
I want to just visualize the top portion of the stacked barplot like so:
I've looked everywhere and can't figure out how to do this. Does anyone know how?
You can use coord_cartesian to "zoom in" on the area you want.
# your plot code...
ggplot(Count_dataframe_melt, aes(x = as.factor(variable), y = value, fill = fill)) +
geom_bar(stat = "identity",position="fill") +
scale_y_continuous(name = "Y-axis",labels = scales::percent) +
# set axis limits in coord_cartesian
coord_cartesian(ylim = c(0.75, 1))
Note that many people consider bar plots that don't start at 0 misleading. A line plot may be a better way to visualize this data.
Since the areas you want to show are less than 20% of the total area, you could flip the bar charts so that you only show the colors areas. Then the y-axis goes from 0-25% and you can use the figure caption to describe that the remaining data is in the gray category.

how to add label ticks to a pie chart created with ggplot2?

I built a pie chart using ggplot2 package but because some of the slices are very small the group labels overlap one another, and the value labels as well. Im looking for a way to get the labels furthere away from the slices and linking the slice and the label with a line.
Im using this data:
a<-c(0.5,0.01,2,50,40,7)
data<-data.frame(a)
data$b<-c("A","B","C","D","E","F")
and I used the following code:
p<- ggplot(data,aes(x=1,y=a,fill=b))
p<- p + geom_bar(stat = "identity",color="black")
p<- p+coord_polar("y")
br<-cumsum(data$a) - data$a/2
p<-p+theme(legend.position = "none",axis.text.x=element_text(color='black',size = 15))+
scale_y_continuous(breaks=br,labels=data$b)+
geom_text(aes(y = a/3 + c(0, cumsum(a)[-length(a)]),
label=a),size=6)
and the resaulted plot is:
and im looking for somthing similar to that one (that I found online):

How to calculate the percentages per bar in stacked bar plot?

I have produced a stacked bar plot using ggplot2 with the code below:
library(ggplot2)
g <- ggplot(data=df.m, aes(x=Type_Checklist,fill=Status)) +
geom_bar(stat="bin", position="stack", size=.3) +
scale_fill_hue(name="Status") +
xlab("Checklists") + ylab("Quantity") +
ggtitle("Status of Assessment Checklists") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, vjust=0.5, size=10))+
stat_bin(geom = "text",aes(label = paste(round((..count..)/sum(..count..)*100), "%")),vjust = 1)
print(g)
The code manages to show the percentage as labels and maintained the actual quantities on the y axis.
However, I wanted the percentages to be calculated per bar (shown on the x axis), not for the entire set of checks (bars in the plot).
I managed to do exactly that with the following code:
#count how many checklists per status`
qty.checklist.by.status.this.week = df.m %>% count(Type_Checklist,Status)
# Add percentage
qty.checklist.by.status.this.week$percentage <- (qty.checklist.by.status.this.week$n/ nrNAs * 100)
#add column position to calculate where to place the percentage in the plot
qty.checklist.by.status.this.week <- qty.checklist.by.status.this.week %>% group_by(Type_Checklist) %>% mutate(pos = (cumsum(n) - 0.5 * n))
g <- ggplot(data=qty.checklist.by.status.this.week, aes(x=Type_Checklist,y=n,fill=Status)) +
geom_bar(stat="identity",position="stack", size=.3) +
scale_fill_hue(name="Status") +
xlab("Checklists") + ylab("Quantity") +
ggtitle("Status of Assessment Checklists") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, vjust=0.5, size=10))+
geom_text(aes(y = pos,label = paste(round(percentage), "%")),size=5)
print(g)
But that solution seems quite manual, since I need to calculate the position of each label, different from the first plot that positions the labels automatically using stat_bin.
Existing answers have been very useful, such as Showing percentage in bars in ggplot
and Show % instead of counts in charts of categorical variables
and How to draw stacked bars in ggplot2 that show percentages based on group?
and R stacked percentage bar plot with percentage of binary factor and labels (with ggplot)
but they don't address exactly this situation.
To reiterate, how could I use the first solution, but calculate the percentages per bar, not for the entire set of bars in the plot?
Could anyone give me some help please? Thanks!

Resources