Combine Grouped and Stacked Bar Graph in R - r

The following is how my data frame looks like:
CatA CatB CatC
1 Y A
1 N B
1 Y C
2 Y A
3 N B
2 N C
3 Y A
4 Y B
4 N C
5 N A
5 Y B
I want to have CatA on X-Axis, and its count on Y-Axis. This graph comes fine. However, I want to create group for CatB and stack it with CatC keeping count in Y axis. This is what I have tried, and this is how it looks:
I want it to look like this:
My code:
ggplot(data, aes(factor(data$catA), data$catB, fill = data$catC))
+ geom_bar(stat="identity", position = "stack")
+ theme_bw() + facet_grid( ~ data$catC)
PS: I am sorry for providing links to images because I am not able to upload it, it gives me error occurred at imgur, every time I upload.

You could use facets:
df <- data.frame(A = sample(1:5, 30, T),
B = sample(c('Y', 'N'), 30, T),
C = rep(LETTERS[1:3], 10))
ggplot(df) + geom_bar(aes(B, fill = C), position = 'stack', width = 0.9) +
facet_wrap(~A, nrow = 1) + theme(panel.spacing = unit(0, "lines"))

Related

stacked bar chart without using fill in geom_bar?

I have some dummy data and am able to create a bar chart and a stacked bar chart:
# some data
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(100, 110, 120),
M = c(10, 15, 20)
)
Looks like this:
egdf
ch N M
1 a 100 10
2 b 110 15
3 c 120 20
Now some charts:
# bar chart
ggplot(egdf, aes(x = ch, y = N)) +
geom_bar(stat = 'identity')
# stacked bar chart
egdf %>%
pivot_longer(cols = c(N, M), names_to = 'metric') %>%
ggplot(aes(x = ch, y = value, fill = metric)) +
geom_bar(stat = 'Identity')
My question is, is there a way to create the stacked bar chart from egdf directly without having to first transform with pivot_longer()?
[EDIT]
Why am I asking for this? My actual dataframe has some additional fields which are based on calculations off the current structure, e.g. it looks more like this:
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(120, 110, 100),
M = c(10, 15, 20)
) %>%
mutate(drop = N - lag(N),
drop_pct = scales::percent(drop / N),
Rate = scales::percent(M / N))
egdf
ch N M drop drop_pct Rate
1 a 120 10 NA <NA> 8.3%
2 b 110 15 -10 -9.09% 13.6%
3 c 100 20 -10 -10.00% 20.0%
In my plot, I'm adding on some additional geoms. If I was to pivot_longer, these relationships would be buckled. If I was able to somehow tell ggplot to make a stacked bar just based on feature1, feature2 (N and M in the example) it would be much easier for this particular use case.
Update: See valuable comment of stefan:
ggplot(egdf1, aes(x=ch, y=N+M)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M")) +
ylab("N") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
First answer:
Are you looking for such a solution?
ggplot(egdf1, aes(x=ch, y=N)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M"))

Dynamic Re-sizing Flexdashboard/Plotly Image to a Different Graphic

I'm building a dynamic flexdashboard with plotly and I was wondering if there was a way to dynamically resize my dashboard. For example, I have created plots of subjects being tested over time. When I shrink the page down, what I'd like is for it to dynamically adjust to a time-series plot of the average for the group at each test day.
My data looks like this:
library(flexdashboard)
library(knitr)
library(tidyverse)
library(plotly)
subject <- rep(c("A", "B", "C"), each = 8)
testDay <- rep(1:8, times = 3)
variable1 <- rnorm(n = length(subject), mean = 30, sd = 10)
variable2 <- rnorm(n = length(subject), mean = 15, sd = 3)
df <- data.frame(subject, testDay, variable1, variable2)
subject testDay variable1 variable2
1 A 1 21.816831 8.575000
2 A 2 14.947327 17.387903
3 A 3 18.014435 16.734653
4 A 4 33.100524 11.381793
5 A 5 37.105911 13.862776
6 A 6 32.181317 10.722458
7 A 7 41.107293 9.176348
8 A 8 36.674051 17.114815
9 B 1 33.710838 17.508234
10 B 2 23.788428 13.903532
11 B 3 42.846120 17.032208
12 B 4 9.785957 15.275293
13 B 5 32.551619 21.172497
14 B 6 36.912465 18.694263
15 B 7 40.061797 13.759541
16 B 8 41.094825 15.472144
17 C 1 27.663408 17.949291
18 C 2 31.263966 11.546486
19 C 3 39.734050 19.831854
20 C 4 25.461309 19.239821
21 C 5 22.128139 10.837672
22 C 6 31.234339 16.976004
23 C 7 46.273664 19.255745
24 C 8 27.057218 21.086204
My plotly code looks like this (a graph of each subject over time):
Dynamic Chart
===========================
Row
-----------------------------------------------------------------------
```{r}
p1 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 1")
ggplotly(p1)
```
```{r}
p2 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 2")
ggplotly(p2)
```
Is there a way that when I shrink the website down these plots can dynamically change to a group average plot, like this:
p1_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 1 Avg")
ggplotly(p1_avg)
p2_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 2 Avg")
ggplotly(p2_avg)
You can put your plotly object inside the plotly function renderPlotly() for dynamically resizing to the page. See an example how I used the function in this blog post:
https://medium.com/analytics-vidhya/shiny-dashboards-with-flexdashboard-e66aaafac1f2

How do i join points within a ggplot in R properly?

I used the code below to create my plot above. Is there a way to adapt my code so that I do not have the long red line joining the two periods of non-peak hours?
Day_2 <- non_cumul[(non_cumul$Day.No == 'Day 2'),]
Day_2$time_test <- between(as.ITime(Day_2$date_time),
as.ITime("09:00:00"),
as.ITime("17:00:00"))
Day2plot <- ggplot(Day_2,
aes(date_time, non_cumul_measurement, color = time_test)) +
geom_point()+
geom_line() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle('Water Meter Averages (Thurs 4th Of Jan 2018)',
'Generally greater water usage between peak hours compared to non peak hours') +
xlab('Date_Times') +
ylab('Measurement in Cubic Feet') +
scale_color_discrete(name="Peak Hours?")
Day2plot +
theme(axis.title.x = element_text(face="bold", colour="black", size=10),
axis.text.x = element_text(angle=90, vjust=0.5, size=10))
From the sound of it, your plot comprises of one observation for each position on the x-axis, and you want consecutive observations of the same color to be joined together in a line.
Here's a simple example that reproduces this:
set.seed(5)
df = data.frame(
x = seq(1, 20),
y = rnorm(20),
color = c(rep("A", 5), rep("B", 9), rep("A", 6))
)
ggplot(df,
aes(x = x, y = y, color = color)) +
geom_line() +
geom_point()
The following code creates a new column "group", which takes on a different value for each collection of consecutive points with the same color. "prev.color" and "change.color" are intermediary columns, included here for clarity:
library(dplyr)
df2 <- df %>%
arrange(x) %>%
mutate(prev.color = lag(color)) %>%
mutate(change.color = is.na(prev.color) | color != prev.color) %>%
mutate(group = cumsum(change.color))
> head(df2, 10)
x y color prev.color change.color group
1 1 -0.84085548 A <NA> TRUE 1
2 2 1.38435934 A A FALSE 1
3 3 -1.25549186 A A FALSE 1
4 4 0.07014277 A A FALSE 1
5 5 1.71144087 A A FALSE 1
6 6 -0.60290798 B A TRUE 2
7 7 -0.47216639 B B FALSE 2
8 8 -0.63537131 B B FALSE 2
9 9 -0.28577363 B B FALSE 2
10 10 0.13810822 B B FALSE 2
ggplot(df2,
aes(x = x, y = y, color = colour, group = group)) +
geom_line() +
geom_point()

r - ggplot paired seq

I am having trouble plotting paired data with ggplot2.
So, I have a database with paired (idpair) individuals (id) and their respective sequences, such as
idpair id 1 2 3 4 5 6 7 8 9 10
1 1 1 d b d a c a d d a b
2 1 2 e d a c c d a b a c
3 2 3 e a a a a c d b c e
4 2 4 d d b c d e a a a b
...
What I would like is to plot all the sequences but that somewhat we can visually distinguish the pair.
I thought of using the grid such as: facet_grid(idpair~.). My issue looks like this:
How could I plot the two sequences side by side removing the "vacuum" in between caused by the other idpair ?
Any suggestions of alternative plotting of paired data are very welcome.
My code
library(ggplot2)
library(dplyr)
library(reshape2)
dtmelt = dt %>% melt(id.vars = c('idpair', 'id')) %>% arrange(idpair, id, variable)
dtmelt %>% ggplot(aes(y = id, x = variable, fill = value)) +
geom_tile() + scale_fill_brewer(palette = 'Set3') +
facet_grid(idpair~.) + theme(legend.position = "none")
generate the data
dt = as.data.frame( cbind( sort( rep(1:10, 2) ) , 1:20, replicate(10, sample(letters[1:5], 20, replace = T)) ) )
colnames(dt) = c('idpair', 'id', 1:10)
You can remove the unused levels in the facet by setting scales = "free_y". This will vary the y-axis limits for each facet.
dtmelt %>% ggplot(aes(y = id, x = variable, fill = value)) +
geom_tile() + scale_fill_brewer(palette = 'Set3') +
facet_grid(idpair~., scales = "free_y") + theme(legend.position = "none")

ggplot2 facets: Different annotation text for each plot

I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).

Resources