I am having trouble plotting paired data with ggplot2.
So, I have a database with paired (idpair) individuals (id) and their respective sequences, such as
idpair id 1 2 3 4 5 6 7 8 9 10
1 1 1 d b d a c a d d a b
2 1 2 e d a c c d a b a c
3 2 3 e a a a a c d b c e
4 2 4 d d b c d e a a a b
...
What I would like is to plot all the sequences but that somewhat we can visually distinguish the pair.
I thought of using the grid such as: facet_grid(idpair~.). My issue looks like this:
How could I plot the two sequences side by side removing the "vacuum" in between caused by the other idpair ?
Any suggestions of alternative plotting of paired data are very welcome.
My code
library(ggplot2)
library(dplyr)
library(reshape2)
dtmelt = dt %>% melt(id.vars = c('idpair', 'id')) %>% arrange(idpair, id, variable)
dtmelt %>% ggplot(aes(y = id, x = variable, fill = value)) +
geom_tile() + scale_fill_brewer(palette = 'Set3') +
facet_grid(idpair~.) + theme(legend.position = "none")
generate the data
dt = as.data.frame( cbind( sort( rep(1:10, 2) ) , 1:20, replicate(10, sample(letters[1:5], 20, replace = T)) ) )
colnames(dt) = c('idpair', 'id', 1:10)
You can remove the unused levels in the facet by setting scales = "free_y". This will vary the y-axis limits for each facet.
dtmelt %>% ggplot(aes(y = id, x = variable, fill = value)) +
geom_tile() + scale_fill_brewer(palette = 'Set3') +
facet_grid(idpair~., scales = "free_y") + theme(legend.position = "none")
Related
I'm building a dynamic flexdashboard with plotly and I was wondering if there was a way to dynamically resize my dashboard. For example, I have created plots of subjects being tested over time. When I shrink the page down, what I'd like is for it to dynamically adjust to a time-series plot of the average for the group at each test day.
My data looks like this:
library(flexdashboard)
library(knitr)
library(tidyverse)
library(plotly)
subject <- rep(c("A", "B", "C"), each = 8)
testDay <- rep(1:8, times = 3)
variable1 <- rnorm(n = length(subject), mean = 30, sd = 10)
variable2 <- rnorm(n = length(subject), mean = 15, sd = 3)
df <- data.frame(subject, testDay, variable1, variable2)
subject testDay variable1 variable2
1 A 1 21.816831 8.575000
2 A 2 14.947327 17.387903
3 A 3 18.014435 16.734653
4 A 4 33.100524 11.381793
5 A 5 37.105911 13.862776
6 A 6 32.181317 10.722458
7 A 7 41.107293 9.176348
8 A 8 36.674051 17.114815
9 B 1 33.710838 17.508234
10 B 2 23.788428 13.903532
11 B 3 42.846120 17.032208
12 B 4 9.785957 15.275293
13 B 5 32.551619 21.172497
14 B 6 36.912465 18.694263
15 B 7 40.061797 13.759541
16 B 8 41.094825 15.472144
17 C 1 27.663408 17.949291
18 C 2 31.263966 11.546486
19 C 3 39.734050 19.831854
20 C 4 25.461309 19.239821
21 C 5 22.128139 10.837672
22 C 6 31.234339 16.976004
23 C 7 46.273664 19.255745
24 C 8 27.057218 21.086204
My plotly code looks like this (a graph of each subject over time):
Dynamic Chart
===========================
Row
-----------------------------------------------------------------------
```{r}
p1 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 1")
ggplotly(p1)
```
```{r}
p2 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 2")
ggplotly(p2)
```
Is there a way that when I shrink the website down these plots can dynamically change to a group average plot, like this:
p1_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 1 Avg")
ggplotly(p1_avg)
p2_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 2 Avg")
ggplotly(p2_avg)
You can put your plotly object inside the plotly function renderPlotly() for dynamically resizing to the page. See an example how I used the function in this blog post:
https://medium.com/analytics-vidhya/shiny-dashboards-with-flexdashboard-e66aaafac1f2
#data
set.seed(1)
data_foo <- data.frame(id = rep(LETTERS[1:4], times = 2), group_measure = c(rep('a_c',4),rep('b_c',4), c(rep('a_d',4),rep('b_d',4))),
value = sample(1:5, size = 16, replace = TRUE))
I would like to plot the 'a' subgroups on the x axis against the 'b' subgroups on the y axis, and one plot for each measure.
Like this:
require(tidyr)
require(ggplot2)
require(patchwork)
data_foo_long <- data_foo %>% spread( group_measure, value)
p1 <- ggplot(data_foo_long, aes(x = a_c, y = b_c)) +
geom_point()
p2 <- ggplot(data_foo_long, aes(x = a_d, y = b_d)) +
geom_point()
p1 + p2
I don't see a way with faceting (?).
But I have the impression that there must be a better, more ggplot-like way of plotting the outcomes of two subgroups within a group against one another when I have them in a long format. Needless to say - there are more measures than those two.
P.S. if someone has a suggestion for a better title of this question, please feel free to comment!
Here is one way. How well it works with "more measures" I will leave to you to decide.
Use tidyr::separate to split the group_measure into a prefix and a suffix, then spread on the prefix:
library(tidyverse)
data_foo %>%
separate(group_measure,
into = c("prefix", "suffix"),
sep = "_") %>%
spread(prefix, value)
id suffix a b
1 A c 2 2
2 A d 4 4
3 B c 2 5
4 B d 1 2
5 C c 3 5
6 C d 2 4
7 D c 5 4
8 D d 1 3
Now you can plot a versus b, faceted by suffix:
data_foo %>%
separate(group_measure,
into = c("prefix", "suffix"),
sep = "_") %>%
spread(prefix, value) %>%
ggplot(aes(a, b)) +
geom_point() +
facet_wrap(~suffix)
I used the code below to create my plot above. Is there a way to adapt my code so that I do not have the long red line joining the two periods of non-peak hours?
Day_2 <- non_cumul[(non_cumul$Day.No == 'Day 2'),]
Day_2$time_test <- between(as.ITime(Day_2$date_time),
as.ITime("09:00:00"),
as.ITime("17:00:00"))
Day2plot <- ggplot(Day_2,
aes(date_time, non_cumul_measurement, color = time_test)) +
geom_point()+
geom_line() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle('Water Meter Averages (Thurs 4th Of Jan 2018)',
'Generally greater water usage between peak hours compared to non peak hours') +
xlab('Date_Times') +
ylab('Measurement in Cubic Feet') +
scale_color_discrete(name="Peak Hours?")
Day2plot +
theme(axis.title.x = element_text(face="bold", colour="black", size=10),
axis.text.x = element_text(angle=90, vjust=0.5, size=10))
From the sound of it, your plot comprises of one observation for each position on the x-axis, and you want consecutive observations of the same color to be joined together in a line.
Here's a simple example that reproduces this:
set.seed(5)
df = data.frame(
x = seq(1, 20),
y = rnorm(20),
color = c(rep("A", 5), rep("B", 9), rep("A", 6))
)
ggplot(df,
aes(x = x, y = y, color = color)) +
geom_line() +
geom_point()
The following code creates a new column "group", which takes on a different value for each collection of consecutive points with the same color. "prev.color" and "change.color" are intermediary columns, included here for clarity:
library(dplyr)
df2 <- df %>%
arrange(x) %>%
mutate(prev.color = lag(color)) %>%
mutate(change.color = is.na(prev.color) | color != prev.color) %>%
mutate(group = cumsum(change.color))
> head(df2, 10)
x y color prev.color change.color group
1 1 -0.84085548 A <NA> TRUE 1
2 2 1.38435934 A A FALSE 1
3 3 -1.25549186 A A FALSE 1
4 4 0.07014277 A A FALSE 1
5 5 1.71144087 A A FALSE 1
6 6 -0.60290798 B A TRUE 2
7 7 -0.47216639 B B FALSE 2
8 8 -0.63537131 B B FALSE 2
9 9 -0.28577363 B B FALSE 2
10 10 0.13810822 B B FALSE 2
ggplot(df2,
aes(x = x, y = y, color = colour, group = group)) +
geom_line() +
geom_point()
The following is how my data frame looks like:
CatA CatB CatC
1 Y A
1 N B
1 Y C
2 Y A
3 N B
2 N C
3 Y A
4 Y B
4 N C
5 N A
5 Y B
I want to have CatA on X-Axis, and its count on Y-Axis. This graph comes fine. However, I want to create group for CatB and stack it with CatC keeping count in Y axis. This is what I have tried, and this is how it looks:
I want it to look like this:
My code:
ggplot(data, aes(factor(data$catA), data$catB, fill = data$catC))
+ geom_bar(stat="identity", position = "stack")
+ theme_bw() + facet_grid( ~ data$catC)
PS: I am sorry for providing links to images because I am not able to upload it, it gives me error occurred at imgur, every time I upload.
You could use facets:
df <- data.frame(A = sample(1:5, 30, T),
B = sample(c('Y', 'N'), 30, T),
C = rep(LETTERS[1:3], 10))
ggplot(df) + geom_bar(aes(B, fill = C), position = 'stack', width = 0.9) +
facet_wrap(~A, nrow = 1) + theme(panel.spacing = unit(0, "lines"))
I'm trying to invert the factors order in only 1 bar in ggplot 2. Reordering the data without define them as a factor usually works, but not in the newest versions.
Example:
I want to invert the factors in the last column (green up, red down).
library(ggplot2)
dados <- expand.grid(a = letters[1:5], b = letters[1:2])
dados$a <- paste(dados$a)
dados$b <- paste(dados$b)
dados$val <- rnorm(10, 5, 1)
ggplot(aes(x = a, y = val, fill = b), data = dados) + geom_bar(stat = 'identity')
dados2 <- rbind(tail(dados, -1), head(dados, 1))
ggplot(aes(x = a, y = val, fill = b), data = dados2) + geom_bar(stat = 'identity') # Used to work :/
I have assigned two additional parameters to e in column b see below c and d:
a b val
2 b a 4.504735
3 c a 5.396658
4 d a 6.796288
5 e c 5.900308
6 a b 3.900510
7 b b 4.454316
8 c b 5.411198
9 d b 6.389902
10 e d 4.458425
1 a a 4.986175
by scale_fill_manual I invert the two colours
ggplot(aes(x = a, y = val, fill = b), data = dados2) +
geom_bar(stat = 'identity') +
scale_fill_manual(values = c("a"= "red", "b"= "green",'c'= "green", "d"="red"))