Whats wrong with ggplot bar chart fill? - r

I'm trying to create a ggplot bar chart, and to create different colored fills for some bars.
I copied from somewhere the code, but with my data it just deosnt work.
Here is the code:
df <- data.frame(cat = c( 0, 1, 2, 3, 4),
perc = c(10, 20, 30, 40, 0),
mark = c( 0, 0, 0, 1, 0))
library(ggplot2)
ggplot(df) +
aes(x = cat, fill = mark, weight = perc) +
geom_bar()
But the result is a colorless chart, with this warning message:
The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
What am I doing wrong?

The issue is that geom_bar uses stat_count by default, so it simply counts up the number of rows at each value of cat. This summary doesn't know what to do with the fill = mark part of your mapping, since there could be multiple values for mark in each category. In your case this isn't obvious because there is only one value for fill at each value of cat, but the same principle applies; if you are using a grouped summary function then you cannot have a row-wise fill variable.
My guess is that you are looking for geom_col
df <- data.frame(cat = c( 0, 1, 2, 3, 4),
perc = c(10, 20, 30, 40, 0),
mark = c( 0, 0, 0, 1, 0))
library(ggplot2)
ggplot(df) +
aes(x = cat, fill = mark, y = perc) +
geom_col()
Created on 2022-11-24 with reprex v2.0.2

Related

How to make a barchart where the x-axis includes gaps

I'd like the x-axis of my barchart to be a continuous scale.
Here is my data:
list(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) ) %>% as_tibble()
I'm hoping to see bars for the 1st, 2nd, and 5th centuries with gaps for the 3rd and 4th.
The trick is to define your x-axis variable as a factor.
library("dplyr")
df <- tibble(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) )
df$CenturyFactor <- factor(df$Century, labels = df$CenturyLabel), ordered = TRUE)
You can then use CenturyFactor as x-axis variable and you'll see a gap with any correct plotting libraries... With the big caveat that any duplicate labels cause the centuries to be merged!
One way around this is to plot Century (1 to 5) but tweak the labels to show CenturyLabel. This will be library-specific. No factors needed.
Using ggplot2:
library("ggplot2")
ggplot(df, aes(x = Century, y = Value)) +
geom_col() +
scale_x_continuous(labels = df$CenturyLabel, breaks = df$Century)

Multiple horizontal barplots in one chart

I want to have two charts containing multiple horizontal bar graphs, each showing mean values of one of the two variables: fear and expectation. The bar graphs should be grouped by the dummies.
I have created single bar graphs with the mean values of fear and expectation grouped by each of the dummies but I don't know how to combine them properly.
x = data.frame(
id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 0, 1),
migration = c(0, 1, 0, 1, 0),
handicap = c(0, 1, 1, 1, 0),
east = c(0, 1, 1, 1, 0),
fear = c(1, 3, 4, 6, 3),
expectation = c(2, 3, 2, 5, 4))
I want to have it look like this basically:
https://ibb.co/3fz0GQ4
Any help would be greatly appreciated.
TO get to the plot you show, you will need to reshape a bit your data:
library(tidyverse)
x2 <- x%>%
gather(fear, expectation, key = "group", value = "value")%>%
gather(sex, migration, handicap, east, key = "dummies", value = "dum_value")%>%
group_by(group, dummies, dum_value)%>%
summarize(prop = mean(value))
Then you can easily get to the plot:
x2%>%
ggplot(aes(y= prop, x = dummies, fill = factor(dum_value)))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()+
facet_wrap(~group)

Project R - Barplot of occurrences of levels

I am struggling with some plots. I have a really big data.frame with some entries. To get an overview I will work with some test data.
Let's assume the following data:
Sender <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akz <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkz <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
data <- data.frame(Sender, Akz, NAkz)
I want to get a (stacked) barplot group by the column "Person". So for each person I want to illustrate the occurrences of the columns "A" and "NA". Means one bar represents the column "A" with 3 "0"s and 4 "1"s and next to this bar I want the column "NA" with 4 "0"s and 3 "1"s. Would be great if there is a possibility to have a legend and the total amount of each level.
Thanks and all the best
Peter
PS: Found a pictures which illustrates a cool barplot. But I am not able to create this since the work with integers and total amounts
Your data is a bit messed up, I trust this is what you wanted to post:
data:
Person <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akzept <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkzept <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
df <- data.frame(Person, Akzept, NAkzept)
The key to plotting in ggplot2 is to arrange the data in long format achieved by the function gather:
library(tidyverse)
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = interaction(var, Person), fill = val))
or perhaps:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
facet_wrap(~var)
with text:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
geom_text(stat = "count", aes(label = ..count.. , x = Person, group = val), position = "stack", vjust = 2, hjust = 0.5)+
facet_wrap(~var)

Why does my graph(using ggplot) vary by the use of as.factor() in R?

Im trying to use bar graph to observe the proportion of employees who left based on promotion.
Data:
structure(list(promo = c(0, 0, 0, 0, 1, 1), left = c(0, 0, 0,
1, 0, 1)), .Names = c("promo", "left"), row.names = c(NA, -6L
), class = "data.frame")
Case 1: I used y = as.factor(left)
ggplot(data = HR, aes(x = as.factor(promotion), y =as.factor(left), fill = factor(promotion), colour=factor(promotion))) +
geom_bar(stat="identity")+
xlab('Promotion (True or False)')+
ylab('The Employees that quit')+
ggtitle('Comparision of Employees that resigned')
This produced the following graph.Case 1
Case 2: I used y = (left)
ggplot(data = HR, aes(x = as.factor(promotion), y = (left), fill = factor(promotion), colour=factor(promotion))) +
geom_bar(stat="identity")+
xlab('Promotion (True or False)')+
ylab('The Employees that quit')+
ggtitle('Comparision of Employees that resigned')
This produced the following graph. Case 2
What causes this difference and which graph should I make inference from?
I'm making a guess that your data looks something like this. In the future, it's very good to share your data reproducibly so it can be copy/pasted like this. (dput() is useful to make a copy/pasteable version of an R object definition.)
df = data.frame(promo = c(rep(0, 4), rep(1, 2)),
left = c(0, 0, 0, 1, 0, 1))
df
# promo left
# 1 0 0
# 2 0 0
# 3 0 0
# 4 0 1
# 5 1 0
# 6 1 1
Your problem isn't the factorness of left. No, your problem is actually that you specify stat = 'identity' in the geom_bar(). stat = 'identity' is used when data is pre-aggregated, that is, when your data frame has the exact values you want to show up in the plot. In this case, your data has 1s and 0s, not the total number each of 1s and 0s, so stat = 'identity' is inappropriate.
In fact, you should not specify a y aesthetic at all because you do not have a column with y values - your left column has individual values that need to be aggregated to get y values, which is handled by geom_bar when stat is not 'identity'.
For counts, the graph is as simple as this:
ggplot(df, aes(x = factor(promo), fill = factor(left))) +
geom_bar()
And to make it a percentage of the total in each case, we can switch to position = 'fill':
ggplot(df, aes(x = factor(promo), fill = factor(left))) +
geom_bar(position = 'fill')
If I'm incorrect in my assumption of how your data look, please provide some sample data in your question. Data is best shared either with code to create it (as above) or via dput().

How to limit the number of categories in a pie chart

The code below generates a pie chart by AlertTypeId. However, there are too many AlertTypeId and I'd like to limit the number of slices in the pie to the X most frequent alert and the rest goes into an "Other" category. How can I do that with ggplot2?
a = c(0, 0, 0, 1, 2, 3, 3, 3)
b = c(1, 1, 0, 0, 1, 1, 1, 1)
c = c(1, 4, 2, 2, 2, 1, 1, 3)
sa2 = data.frame(WeekOfYear = a, UrgentState = b, AlertTypeId = c, IsUrgent = b)
ggplot(sa2, aes(x = factor(1), fill = factor(AlertTypeId))) +
geom_bar(width = 1) +
coord_polar(theta = "y")
There are many ways to go about it, but the basic idea is that you need to
identify which AlertId's you want to select. This involves counting the number of rows per id.
send to ggplot a data.frame (or data.table) containing only those rows that you want to plot.
Here is an example using data.table:
Edit: I broke this up into multiple lines to make it easier to follow
library(data.table)
sa2.DT <- data.table(sa2, key="AlertTypeId")
# we can count the rows per id, by taking the length of any other column
ATid.Counts <- sa2.DT[, list(AT.count=length(UrgentState)), by=AlertTypeId]
# then order Id's by their counts. We will then take the `head( )`
# of this vector to identify the group being kept
ATid.Ordered <- ATid.Counts[order(AT.count, decreasing=TRUE), AlertTypeId]
ATid.Ordered is the list of Ids ordered by their frequency count.
Taking head(ATid.Ordered, n) will give the top n many of those.
Since we had set the key to sa2.DT as these Ids, we can therefore use
the ordered list (or a portion of it) to subset the data.table
# select only those rows which have an AlertTypeId in the top n many
dat <- sa2.DT[.(head(ATid.Ordered, n=3)) ] # <~~ note the dot in `.( )`
dat is the data.table (or data.frame) that we will use in ggplot
# use that selection to plot
ggplot(dat, aes(x = factor(1), fill = factor(AlertTypeId))) +
geom_bar(width = 1) +
coord_polar(theta = "y")

Resources