So I have a simple data frame where the first column includes roadway IDs and the next 10 columns have traffic volumes on each roadway ID over 10 years.
I have been trying to come up with a code to display roadway ID on X axis and Traffic volume on Y axis. Then animate the graph over multiple years (Traffic volumes on the Y axis change). Here is a sample of my data frame:
Could anyone suggest a piece of code to do it? Here is a code that I have written but doesn't really work. I know this may be very wrong, but I am very new to gganimate and not sure how I can get different functions to work. Any help is appreciated.
year <- c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010)
p1 <- ggplot(data = Data) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2001Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2002Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2003Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2004Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2005Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2006Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2007Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2008Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2009Traffic)) +
geom_point(aes(x = Data$LinkIDs, y=Data$Year2010Traffic)) +
labs(title = 'Year: {frame_time}', x = 'Link ID', y = 'Traffic Volume') +
transition_time(year)
animate(p1)
Most of the work lies in changing the data before you send it to ggplot and gganimate. To help you with that work, I have created some sample data based on your picture (in the future please supply sample data yourself).
library(tidyverse)
library(gganimate)
df <- tribble(
~LinkIDs, ~Year2001Traffic, ~Year2002Traffic, ~Year2003Traffic,
"A", 1, 10, 15,
"B", 3, 1, 10,
"C", 10, 5, 1)
df
# A tibble: 3 x 4
LinkIDs Year2001Traffic Year2002Traffic Year2003Traffic
<chr> <dbl> <dbl> <dbl>
1 A 1 10 15
2 B 3 1 10
3 C 10 5 1
gganimate and ggplot work best with data in long format. So the first step is to change the data from wide to long before sending it to ggplot.
df <- df %>% gather(Year, Traffic, -LinkIDs)
df
# A tibble: 9 x 3
LinkIDs Year Traffic
<chr> <chr> <dbl>
1 A Year2001Traffic 1
2 B Year2001Traffic 3
3 C Year2001Traffic 10
4 A Year2002Traffic 10
5 B Year2002Traffic 1
6 C Year2002Traffic 5
7 A Year2003Traffic 15
8 B Year2003Traffic 10
9 C Year2003Traffic 1
gganimate needs the Year column to be a number before it can use it for animation. So we need to extract the numbers that are contained in the values.
df <- df %>% mutate(
Year = parse_number(Year))
df
# A tibble: 9 x 3
LinkIDs Year Traffic
<chr> <dbl> <dbl>
1 A 2001 1
2 B 2001 3
3 C 2001 10
4 A 2002 10
5 B 2002 1
6 C 2002 5
7 A 2003 15
8 B 2003 10
9 C 2003 1
Now the rest is straightforward. Just the plot the data, and use the year variable for the animation argument.
p1 <- ggplot(df, aes(x = LinkIDs, y = Traffic))+
geom_point()+
labs(title = 'Year: {frame_time}', x = 'Link ID', y = 'Traffic Volume')+
transition_time(Year)
animate(p1)
_________________________ EDIT AFTER UPDATED COMMENTS_______
Request in comments:
"I just want it to go through the timeline (from 2001 to 2003) just
once and then stop at 2003."
In case you want to stop at the year 2003, you would need to filter the data before you send it to ggplot - this is done via the filter command.
As of 23/3 2019, the is, as far as I know, no way to go through the animation just once. You can alter the end_pause argument in order to insert a pause after each iteration of the animation (I changed geom_point() to geom_col() given your description).
p2 <- df %>%
#keep only observations from the year 2003 and earlier
filter(Year <= 2003) %>%
#Send the data to plot
ggplot(aes(x = LinkIDs, y = Traffic, fill = LinkIDs))+
geom_col()+
labs(title = 'Year: {frame_time}', x = 'Link ID', y = 'Traffic Volume')+
transition_time(Year)
animate(p2, fps = 20, duration = 25, end_pause = 95)
Related
I'm trying to represent the movements of patients between several treatment groups measured in 3 different years. However, there're dropouts where some patients from 1st year are missing in the 2nd year or there are patients in the 2nd year who weren't in the 1st. Same for 3rd year. I have a label called "none" for these combinations, but I don't want it to be in the plot.
An example plot with only 2 years:
EDIT
I have tried with geom_sankey as well (https://rdrr.io/github/davidsjoberg/ggsankey/man/geom_sankey.html).
Although it is more accurate to what I'm looking for. I don't know how to omit the stratum groups without labels (NA). In this case, I'm using my full data, not a dummy example. I can't share it but I can try to create an example if needed. This is the code I've tried:
data = bind_rows(data_2015,data_2017,data_2019) %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
ggplot(df_sankey, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Figure:
Any idea how to omit the missing groups every year as stratum (without omitting them in the alluvium) will be super helpful. Thanks!
Solved! The solution was much easier I though. I'll leave here the solution in case someone else struggles with a similar problem.
Create a wide table of counts per every group / cohort.
# Data with 3 cohorts for years 2015, 2017 and 2019
# Grp is a factor with 3 levels: 1 to 6
# sip is a unique ID
library(tidyverse)
data_wide = data %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp, levels=c(1:6)), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
Using ggsankey package we can transform it as the specific type the package expects. There's already an useful function for this.
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
# The tibble accounts for every change in X axis and Y categorical value (node):
> head(df_sankey)
# A tibble: 6 × 4
x node next_x next_node
<fct> <chr> <fct> <chr>
1 2015 3 2017 2
2 2017 2 2019 2
3 2019 2 NA NA
4 2015 NA 2017 1
5 2017 1 2019 1
6 2019 1 NA NA
Looks like using the pivot_wider() to pass it to make_long() created a situation where each combination for every value was completed, including missings as NA. Drop NA values in 'node' and create the plot.
df_sankey %>% drop_na(node) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Solved!
I have some data about events happening at some hours of the day in certain conditions.
The data_frame looks somehow like this :
> tibble(event_id = 1:1000, hour = rep_len(0:23, 1000), conditions = rep_len(c("Non", "Oui"), 1000))
# A tibble: 1,000 × 3
event_id hour conditions
<int> <int> <chr>
1 1 0 Non
2 2 1 Oui
3 3 2 Non
4 4 3 Oui
5 5 4 Non
6 6 5 Oui
7 7 6 Non
8 8 7 Oui
9 9 8 Non
10 10 9 Oui
Somehow I have managed to represent it using geom_bar this way :
mydataframe %>%
group_by(hour, conditions) %>%
count() %>%
ggplot() +
geom_bar(aes(x = hour, y = n, fill = conditions), stat = "identity", position = "dodge")
With my actual data, I get a figure looking like this :
But I would like to get something like 2 dodged smoothlines or geom_density which I can't seem to get.
Do you have some ideas to help me ?
Thank you
library(tidyverse)
set.seed(42)
mydataframe <- tibble(event_id = 1:1000, hour = rep_len(0:23, 1000), conditions = sample(c("Non", "Oui"), 1000, replace = TRUE))
mydataframe %>%
count(hour, conditions) %>%
ggplot() +
geom_smooth(aes(hour, n, color = conditions), se = FALSE, span = 0.3)
Or if you want to dodge them, you could do this and tweak the amount of width between the series:
mydataframe %>%
count(hour, conditions) %>%
ggplot() +
geom_smooth(aes(hour, n, color = conditions), se = FALSE, span = 0.3,
position = position_dodge(width = 1))
I'm learning to use ggplot to plot my data. I found many examples such as ggplot multiple grouping bar and Grouped bar plot in ggplot. However, I cannot adapt their case with my data at this moment.
This is what the sample looks like:
# A tibble: 10 x 3
clusterNum Road period
<dbl> <chr> <chr>
1 2 Hualampong 06.00-06.15
2 2 Hualampong 06.00-06.15
3 2 Hualampong 06.16-06.30
4 2 Hualampong 06.16-06.30
5 2 Hualampong 06.16-06.30
6 3 Hualampong 06.16-06.30
7 2 Hualampong 06.16-06.30
8 3 Tonglor 17.46-18.00
9 3 Tonglor 17.46-18.00
10 3 Tonglor 17.46-18.00
data <- structure(list(clusterNum = c(2, 2, 2, 2, 2, 3, 2, 3, 3, 3),Road = c("Hualampong", "Hualampong", "Hualampong", "Hualampong","Hualampong", "Hualampong", "Hualampong", "Tonglor", "Tonglor","Tonglor"), period = c("06.00-06.15", "06.00-06.15", "06.16-06.30","06.16-06.30", "06.16-06.30", "06.16-06.30", "06.16-06.30","17.46-18.00", "17.46-18.00", "17.46-18.00")), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
As you can see from my data, I want to create bar charts. Showing the total number of clusterNum columns with each period separately with the Road column. So, I might have two graphs based on the Road column.
My expected graph may look like this
Thank you for any helps.
Or if you're looking for separate graphs, you can use facet_wrap:
library(tidyverse)
data2 <- data %>% group_by(period, Road) %>% summarise(clusterNum = sum(clusterNum))
ggplot(data2, aes(x = period, y = clusterNum, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
facet_wrap(~Road)
With an additional breakout by clusterNum:
library(tidyverse)
data3 <- data %>% group_by(period, Road, clusterNum) %>%
count() %>%
data.frame()
data3$n <- as.factor(data3$n)
data3$clusterNum <- as.factor(data3$clusterNum)
ggplot(data3, aes(x = period, y = n, fill = clusterNum)) +
geom_bar(position = "dodge", stat = "identity") +
facet_wrap(~Road) +
theme_minimal()
Maybe something like this:
library(tidyverse)
data1 <- data %>%
group_by(clusterNum, Road, period) %>%
count()
ggplot(data1, aes(x=period, y=n, group=clusterNum)) +
geom_bar(aes(fill = Road),
position = "dodge",
stat = "identity")
I'm building a dynamic flexdashboard with plotly and I was wondering if there was a way to dynamically resize my dashboard. For example, I have created plots of subjects being tested over time. When I shrink the page down, what I'd like is for it to dynamically adjust to a time-series plot of the average for the group at each test day.
My data looks like this:
library(flexdashboard)
library(knitr)
library(tidyverse)
library(plotly)
subject <- rep(c("A", "B", "C"), each = 8)
testDay <- rep(1:8, times = 3)
variable1 <- rnorm(n = length(subject), mean = 30, sd = 10)
variable2 <- rnorm(n = length(subject), mean = 15, sd = 3)
df <- data.frame(subject, testDay, variable1, variable2)
subject testDay variable1 variable2
1 A 1 21.816831 8.575000
2 A 2 14.947327 17.387903
3 A 3 18.014435 16.734653
4 A 4 33.100524 11.381793
5 A 5 37.105911 13.862776
6 A 6 32.181317 10.722458
7 A 7 41.107293 9.176348
8 A 8 36.674051 17.114815
9 B 1 33.710838 17.508234
10 B 2 23.788428 13.903532
11 B 3 42.846120 17.032208
12 B 4 9.785957 15.275293
13 B 5 32.551619 21.172497
14 B 6 36.912465 18.694263
15 B 7 40.061797 13.759541
16 B 8 41.094825 15.472144
17 C 1 27.663408 17.949291
18 C 2 31.263966 11.546486
19 C 3 39.734050 19.831854
20 C 4 25.461309 19.239821
21 C 5 22.128139 10.837672
22 C 6 31.234339 16.976004
23 C 7 46.273664 19.255745
24 C 8 27.057218 21.086204
My plotly code looks like this (a graph of each subject over time):
Dynamic Chart
===========================
Row
-----------------------------------------------------------------------
```{r}
p1 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 1")
ggplotly(p1)
```
```{r}
p2 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 2")
ggplotly(p2)
```
Is there a way that when I shrink the website down these plots can dynamically change to a group average plot, like this:
p1_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 1 Avg")
ggplotly(p1_avg)
p2_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 2 Avg")
ggplotly(p2_avg)
You can put your plotly object inside the plotly function renderPlotly() for dynamically resizing to the page. See an example how I used the function in this blog post:
https://medium.com/analytics-vidhya/shiny-dashboards-with-flexdashboard-e66aaafac1f2
I have a simple dataframe that looks like this:
df
steps numbers rate
1 clicks 332835 100.000000
2 signup 157697 47.379933
3 cart 29866 8.973215
4 buys 17012 5.111241
How can I plot a simple conversion funnel instead of a barchart?
If you must do the funnel thing, it's just a variation on bar chart:
library(ggplot2)
library(reshape2) # for melt()
# get data
dat <- read.table(text=
"steps numbers rate
clicks 332835 100.000000
signup 157697 47.379933
cart 29866 8.973215
buys 17012 5.111241",
header = T)
# add spacing, melt, sort
total <- subset(dat, rate==100)$numbers
dat$padding <- (total - dat$numbers) / 2
molten <- melt(dat[, -3], id.var='steps')
molten <- molten[order(molten$variable, decreasing = T), ]
molten$steps <- factor(molten$steps, levels = rev(dat$steps))
ggplot(molten, aes(x=steps)) +
geom_bar(aes(y = value, fill = variable),
stat='identity', position='stack') +
geom_text(data=dat,
aes(y=total/2, label= paste(round(rate), '%')),
color='white') +
scale_fill_manual(values = c('grey40', NA) ) +
coord_flip() +
theme(legend.position = 'none') +
labs(x='stage', y='volume')
That said, there's no real point in a "funnel chart" - the same information can be presented in a plain bar chart with less fuss:
# get data
dat <- read.table(text=
"steps numbers rate
clicks 332835 100.000000
signup 157697 47.379933
cart 29866 8.973215
buys 17012 5.111241",
header = T)
# order x axis
dat$steps <- factor(dat$steps, levels = dat$steps)
# plot
ggplot(dat, aes(x=steps, y=numbers)) +
geom_bar(stat='identity') +
geom_text(aes(label = paste(round(rate), '%')), vjust=-0.5
Alternatively, you can do a simple funnel in highcharts. My dataframe looks like this:
# data is a df called check_stage
check_stage
# A tibble: 9 × 4
stage_name count x percent
<ord> <int> <chr> <dbl>
1 Opportunity Disqualified 805 1 13.5
2 Qualifying 5138 2 86.5
3 Evaluation 1773 3 29.8
4 Meeting Scheduled 4104 4 69.1
5 Quoted 4976 5 83.7
6 Order Submitted 1673 6 28.2
7 Closed Won 1413 7 23.8
8 Closed Lost 957 8 16.1
9 Nurture 1222 9 20.6
library(highcharter)
# make a funnel plot of stage
options(highcharter.theme = hc_theme_smpl(tooltip = list(valueDecimals = 1,
pointFormat = "{point.y}%")))
hc <- check_stage %>%
hchart(
"funnel", hcaes(x = stage_name, y = percent), name = "Proportion of Leads")
hc