Stacked histogram in R. fill not stacking - r

Trying to make a stacked histogram, but it just comes out grey, with no stacking. I don't understand what is different from all the examples on here, or the built in 'iris' example, unless using time as x variable is a problem.
I have a big df, in long format, cut down to 25 rows and named 'mini' for this example:
> dput(mini)
structure(list(maxdep = c(203.9540564, 212.9573869, 13.45896065,
209.961431, 162.9633891, 13.97961439, 85.48389032, 102.4905817,
100.0035986, 88.02608837, 89.02947373, 22.0301996, 20.03060219,
19.03098037, 29.03141345, 13.03170014, 82.0328164, 55.03384725,
15.03437183, 17.53463412, 37.5352136, 70.03588457, 90.53687883,
91.53861116, 10.03902594), st_time = structure(c(1633321800,
1633328510, 1633331050, 1633331285, 1633334080, 1633347960, 1633348185,
1633355115, 1633279830, 1633298825, 1633301480, 1633302985, 1633303300,
1633303600, 1633303825, 1633304280, 1633304430, 1633305635, 1633306445,
1633306610, 1633306890, 1633307310, 1633307960, 1633309380, 1633310320
), class = c("POSIXct", "POSIXt"), tzone = ""), dbin = c(2, 2,
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1)), row.names = c(NA, 25L), class = "data.frame")
the code is simple:
gg3 <- ggplot(data = mini, aes(x = st_time, fill = dbin)) #
gg3 <- gg3 + geom_histogram(position = "stack", binwidth = 3600) # gives hourly columns in histogram
gg3
this should plot the start time of the data on the x axis - correct, against the count on y - correct and stack in colour by dbin value (e.g. 1 through 5) - producing 5 colours of histogram stacked on top of each other (only two are present in the sample data above).
Instead I get one grey plot of all data (25 count total). please help me understand what is wrong

You can change dbin to a factor:
mini %>%
ggplot(aes(x = st_time, fill = as.factor(dbin) )) +
geom_histogram(position = "stack", binwidth = 3600)

Related

Reorder Function in R Issues [duplicate]

This question already has answers here:
Ordering of bars in ggplot
(2 answers)
Plot data in descending order as appears in data frame [duplicate]
(1 answer)
Order Bars in ggplot2 bar graph
(16 answers)
How to show bars in ggplot2 in descending order of a numeric vector?
(2 answers)
Closed last year.
I am trying to add reorder to get the data set to arrange from largest to smallest but having issues.
ggplot(Manager_Graph_Data, aes(x = reorder(Manager_Graph_Data$`Completion Rate`), y = Manager_Graph_Data$Manager)) +
geom_bar(stat="identity",
position="identity",
fill="#0077b5")
structure(list(Manager = c("Bob Beno", "Dylan Tracy", "Ignacia Lemley",
"Jaimee Cogdill", "Jeneva Engman", "Julianne Holdren", "Lakia Farrington",
"Lester Braden", "Soon Mooneyham"), Complete = c(5, 5, 1, 4,
0, 0, 3, 2, 5), Incomplete = c(6, 6, 7, 2, 3, 4, 5, 2, 3), Total = c(11,
11, 8, 6, 3, 4, 8, 4, 8), `Completion Rate` = c(0.454545454545455,
0.454545454545455, 0.125, 0.666666666666667, 0, 0, 0.375, 0.5,
0.625)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
Any help will be much appreciated
It looks like you're trying to order by the completion rate. When you have a bar or column chart, ggplot will order by the factor (or character) field. So to change the order, set the factor levels. There are a variety of ways to do this. Here is one way:
library(tidyverse)
# order by rate decreasing
MgrRate <- Manager_Graph_Data %>%
arrange(`Completion Rate`, decreasing = T) %>%
mutate(Manager = ordered(Manager, levels = .$Manager))
ggplot(MgrRate,
aes(x = `Completion Rate`,
y = Manager)) +
geom_bar(stat="identity",
position="identity",
fill="#0077b5")
In case you were not aware, if you want to set an x and y, try using geom_col() to simplify things.
# alternatively (creating the same plot)
ggplot(MgrRate,
aes(x = `Completion Rate`,
y = Manager)) +
geom_col(fill="#0077b5")
If you actually wanted to order by the manager, here's an example of how to do that. (This is by the first name of the manager.)
# order by manager's first name
Mgr <- Manager_Graph_Data %>%
arrange(desc(Manager)) %>%
mutate(Manager = ordered(Manager, levels = .$Manager))
ggplot(Mgr,
aes(x = `Completion Rate`,
y = Manager)) +
geom_col(fill="#0077b5")
Just so you are aware, when you flip the axes (but the factor on y, instead of x) you have to reverse the order.

How to draw a stacked barplot with three categorical variables representing the proportion of only one of them for each facet in r?

This is the data to take up as a reference
df <- data.frame(a = c(3,3,3,3,3,2,2,3,2,1,1,1,3,1,3), b = c(1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2), c = c(4, 5, 3, 2, 4, 2, 3, 4, 5, 4, 4, 3, 3, 1, 2) )
I want to draw a bargraph with the proportion of a for each facet. At the same time I want the bars to be colored according to the b value.
The variable b is not relevant for calculating the percentage. This is what I came up with, when I set the fill = c, it divides the stacked color in two, one corresponding to 1, and the other as NA.
ggplot(aes(x = a, y = ...prop..., group = 1, fill = b)) +
geom_bar(position = "stack") +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c")
how can I have a result similar to this one but with the proportions of a for each facet wrap instead of the absolute values?
Thank you!
Here's an approach using the ..count.. and ..PANEL.. special symbols:
ggplot(df, aes(x = a, fill = as.factor(b))) +
geom_bar(aes(y = ..count.. / tapply(..count..,..PANEL..,sum)[..PANEL..])) +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c", fill = "b", y = "Proportion")
If you weren't using facet_wrap this would be trivial by setting y = ..prop... However, ..prop.. is not caculated properly by facet. So, to get around this problem, we can use tapply and the ..PANEL.. special symbol to sum ..count.. only for that panel. The last [..PANEL..] is to subset the resulting vector.
The other issue you had was that b is class numeric, so you need to convert that to factor.

How to make a barchart where the x-axis includes gaps

I'd like the x-axis of my barchart to be a continuous scale.
Here is my data:
list(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) ) %>% as_tibble()
I'm hoping to see bars for the 1st, 2nd, and 5th centuries with gaps for the 3rd and 4th.
The trick is to define your x-axis variable as a factor.
library("dplyr")
df <- tibble(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) )
df$CenturyFactor <- factor(df$Century, labels = df$CenturyLabel), ordered = TRUE)
You can then use CenturyFactor as x-axis variable and you'll see a gap with any correct plotting libraries... With the big caveat that any duplicate labels cause the centuries to be merged!
One way around this is to plot Century (1 to 5) but tweak the labels to show CenturyLabel. This will be library-specific. No factors needed.
Using ggplot2:
library("ggplot2")
ggplot(df, aes(x = Century, y = Value)) +
geom_col() +
scale_x_continuous(labels = df$CenturyLabel, breaks = df$Century)

Multiple horizontal barplots in one chart

I want to have two charts containing multiple horizontal bar graphs, each showing mean values of one of the two variables: fear and expectation. The bar graphs should be grouped by the dummies.
I have created single bar graphs with the mean values of fear and expectation grouped by each of the dummies but I don't know how to combine them properly.
x = data.frame(
id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 0, 1),
migration = c(0, 1, 0, 1, 0),
handicap = c(0, 1, 1, 1, 0),
east = c(0, 1, 1, 1, 0),
fear = c(1, 3, 4, 6, 3),
expectation = c(2, 3, 2, 5, 4))
I want to have it look like this basically:
https://ibb.co/3fz0GQ4
Any help would be greatly appreciated.
TO get to the plot you show, you will need to reshape a bit your data:
library(tidyverse)
x2 <- x%>%
gather(fear, expectation, key = "group", value = "value")%>%
gather(sex, migration, handicap, east, key = "dummies", value = "dum_value")%>%
group_by(group, dummies, dum_value)%>%
summarize(prop = mean(value))
Then you can easily get to the plot:
x2%>%
ggplot(aes(y= prop, x = dummies, fill = factor(dum_value)))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()+
facet_wrap(~group)

Using ggplot in R to create a line graph for two different groups

I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.

Resources