geom_area plot stacks areas by default - r

I am using geom_area to plot a very simple dataset. When plotting using geom_line everything is normal but when I switch to geom_area higher values getting plotted. I think looking at the graphs would be the best way of representing my problem:
require(tidyverse)
x <- structure(list(Time = 0:40, X15.DCIA = c(0, 1, 0.5, 0, 2, 2.5,
1, 0.5, 0, 1, 1.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 3,
5, 7, 6.5, 5.5, 4, 3, 2, 1.5, 1, 0.25, 0, 0, 0, 0, 0, 0, 0),
X100.DCIA = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1.5, 7, 8, 7.5, 6.5, 5, 3.5, 2.25,
1.75, 1.1, 0.4, 0.1, 0, 0, 0, 0, 0, 0)),
class = "data.frame", row.names = c(NA,-41L))
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_line(aes(color=prct.DCIA))
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_area(aes(fill=prct.DCIA))
The geom_line is what I expected (a line plot of my data).
But then looking at the geom_area you see that 100DCIA has jumped up-to 15.
I am more interested in an explanation rather than a fix or workaround.
Note:
This can be a workaround:
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_polygon(aes(fill=prct.DCIA, alpha=0.5)) + guides(alpha=FALSE)

Explanation:
Your plots are stacking on top of one another.
The values you see following the red line in the geom_area graph are the sum of the values for the red and blue lines in your geom_line graph.
You can see this clearly if you separate out prct.DCIA with facet_grid():
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_area(aes(fill=prct.DCIA)) + facet_grid(.~prct.DCIA)
This is simply because position = "stack" is the default argument in geom_area:
geom_area(mapping = NULL, data = NULL, stat = "identity",
position = "stack", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...)
One might presume this is because people use geom_area because they want to show the whole area on a diagram, rather than fill under some lines. Generally bars or area might represent a count of something, or the area filled in represents something, while points or lines may represent a point estimate and the area above or below the line or point isn't meaningful.
Cf. the default argument for geom_line is position = "identity".
geom_line(mapping = NULL, data = NULL, stat = "identity",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...)
Fix:
If you use position = position_dodge() you can see they return to looking like the line graph, with the red area is plotted behind the blue area:
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_area(aes(fill=prct.DCIA), position = position_dodge())
You can even set alpha < 1 and see this clearly:
x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
geom_area(aes(fill=prct.DCIA), position = position_dodge(), alpha = 0.5)

Related

r ggplot2: fill area under curves with geom_step

I'm trying to fill area under each step function using ggplot2 and geom_step. Here's an example dataset:
time = c(0, 5, 8, 11, 14, 18, 20, 0, 3, 7, 13, 19, 20, 0, 4, 9, 15, 18)
prob = c(1, 0.95, 0.80, 0.62, 0.30, 0.03, 0, 1, 0.92, 0.75, 0.57, 0.21, 0, 1, 0.80, 0.64, 0.43, 0)
group = c(1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3)
df = data.frame(time, prob, group)
Here's the codes i've tried:
plot1 = ggplot(df, aes(x = time, y = prob, group = group, fill = group)) +
geom_step()+
geom_ribbon(data = df, aes(ymin = 0, ymax = prob))
The problem is that, after fill the area, only group 1 has the step line, and the area filling is not following the step function.
You may use geom_rect instead of geom_ribbon.
df %>%
mutate(group = as.factor(group)) %>%
ggplot(aes(x = time, y = prob, group = group, fill = group)) +
geom_step()+
geom_rect(aes(xmin = time, xmax = lead(time),
ymin = 0, ymax = prob), alpha = 0.4)

Plotting multiple lines over time in ggplot2; hope to better distinguish lines [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 3 years ago.
I am mainly posting because I really think I am over complicating this. I am creating a plot of 12 different lines over time. I would like each day to be represented on the x-axis with the "title" beneath each.
I've tried a few solutions and what I have "works" but it's not that good. Ignoring the placeholders I have in there I would like there to be points where they increase as well as showing where people are a little more clearly. My code seems a little long winded; maybe there is a better way to do this.
riddle_log <- structure(list(date = structure(c(1559779200, 1559865600, 1560124800,
1560211200, 1560297600, 1560384000, 1560470400, 1560470400, 1560470400,
1560729600, 1560729600, 1560816000, 1560902400, 1560988800, 1561075200,
1561334400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
title = c("The Midget", "Bowling Balls", "Poisonous Ice",
"Dog Crosses River", "Camel Race", "Two Masked Men", "The Cabin",
"Black Truck", "Burglary", "Japanese Ship", "Haunted Floor",
"East and West", "Filling the Room", "Untied", "Window Jumper",
"Window Faller"), Brigid = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), Carly = c(0, 1, 1, 1, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3), Christian = c(1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 3, 3, 3, 3, 4, 4), Daniel = c(0, 0, 0, 0, 0, 1, 1,
2, 2, 2, 2, 3, 3, 3, 3, 3.5), Jess = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Luke = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Mara = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Marcus = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 2, 2, 3, 3, 3, 3.5), Nassim = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Nathalie = c(0, 0, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Neil = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA,
-16L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
library(ggthemes)
line1 <- riddle_log %>%
select(date, Brigid)
line2 <- riddle_log %>%
select(date, Carly)
line3 <- riddle_log %>%
select(date, Christian)
line4 <- riddle_log %>%
select(date, Daniel)
line5 <- riddle_log %>%
select(date, Jess)
line6 <- riddle_log %>%
select(date, Luke)
line7 <- riddle_log %>%
select(date, Mara)
line8 <- riddle_log %>%
select(date, Marcus)
line9 <- riddle_log %>%
select(date, Nassim)
line10 <- riddle_log %>%
select(date, Nathalie)
line11 <- riddle_log %>%
select(date, Neil)
ggplot() +
geom_line(data = line1, aes(x = date, y = Brigid, color = "a")) +
geom_line(data = line2, aes(x = date, y = Carly, color = "b")) +
geom_line(data = line3, aes(x = date, y = Christian, color = "c")) +
geom_line(data = line4, aes(x = date, y = Daniel, color = "d")) +
geom_line(data = line5, aes(x = date, y = Jess, color = "e")) +
geom_line(data = line6, aes(x = date, y = Luke, color = "f")) +
geom_line(data = line7, aes(x = date, y = Mara, color = "g")) +
geom_line(data = line8, aes(x = date, y = Marcus, color = "h")) +
geom_line(data = line9, aes(x = date, y = Nassim, color = "i")) +
geom_line(data = line10, aes(x = date, y = Nathalie, color = "j")) +
geom_line(data = line11, aes(x = date, y = Neil, color = "k")) +
scale_color_manual(name = "Analysts",
values = c("a" = "blue", "b" = "red", "c" = "orange", "d" = "black",
"e" = "steelblue", "f" = "blue", "g" = "blue", "h" = "blue",
"i" = "blue", "j" = "blue", "k" = "blue")) +
xlab('Date') +
ylab('Wins') +
ggtitle(" NAME ")
#+
# scale_x_date(breaks = as.Date(c("2019-05-01", "2019-08-15")))
# scale_x_discrete(name, breaks, labels, limits)
In short what I would like to add four things:
-All dates represented on the x-axis. The weekends are skipped but I would not want them to have gaps in the plot rather treated as consecutive days.
-If it's possible to have the title incorperated somehow that would be cool except I am struggling to think how since some days have multiple titles.
-A more distinguished way to see all line progress as opposed to the bad overlap that's happening here
-Points.
If there are any themes that are better suited for this type of problem I'm open for anything.
First of all, you are right that your code is "a little long winded". To take advantage of ggplot you should have your data in tidy ("tall") format, with one variable for "person" and another variable for the persons' score. That is easy to accomplish using gather() in the tidyr package:
riddle_log2 <- riddle_log %>%
tidyr::gather("Analyst", "Wins", Brigid:Neil)
Now that the data are in the preferred format for ggplot, we can plot them much more easily, like this:
ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) +
geom_line(size = 2)
However, a lot of the lines are on top of each other. We can try to make the plot better by plotting the first persons (which are plotted first and will end up behind the other lines) with thicker lines, for instance like this:
ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) +
geom_line(aes(size = Analyst)) +
scale_size_manual(values = seq(4, 1, length = 11))
Now, this is slightly better. Next, we can improve the colors. There are a huge amount of color palettes for R available. In cases such as this I often use the palettes of Paul Tol:
tol_colors = c("#332288", "#6699CC", "#88CCEE", "#44AA99", "#117733", "#999933",
"#DDCC77", "#661100", "#CC6677", "#882255", "#AA4499")
ggplot(riddle_log2) +
geom_line(aes(x = date, y = Wins, color = Analyst, size = Analyst)) +
scale_size_manual(values = seq(5, 1, length = 11)) +
scale_color_manual(values = tol_colors)
Now, this isn't perfect, but it is an improvement. What you should consider is to split the plots in a bunch of subplots using facet_wrap():
gg <- ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) +
geom_line(size = 2) +
scale_color_manual(values = tol_colors) +
facet_wrap(~Analyst)
gg
This is a much better option in this case, I think.
Next, you also want the x axis to show all dates. It is bit too little space to show every single day, so I will here show labels for every second day:
gg +
scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") +
theme(axis.text.x = element_text(hjust = 0, angle = -45))
As you can see, formatting labels isn't exactly straightforward, but it is very flexible. Especially the codes for how to show the time/date are quite criptic; in this case, %d indicates "date" and %mindicates "abbreviated month". Other codes can be found by running ?strptime.
Finally, wer'e going to add the day's "title" for every time the "Win" score is increasing. We start by adding a variable 'Wins_increase' for the increase in Wins:
riddle_log2 <- riddle_log2 %>%
arrange(Analyst, date) %>% # Make sure sortings is correct
group_by(Analyst) %>% # 'Wins_increase' will be calculated for every Analyst
mutate(Wins_increase = Wins - lag(Wins)) # How much 'Wins' have increased since last day
Then we use geom_text() to add rotated labels:
gg + scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") + # as before
theme(axis.text.x = element_text(hjust = 0, angle = -45)) + # as before
geom_text(data = riddle_log2 %>% filter(Wins_increase > 0), # Pick only where "Wins" is increasing
aes(y = Wins + 0.3, label = title), # We add 0.3 to lift the labels a bit
hjust = 0, angle = 90, size = 2) # Left-adjust and rotate labels
The next thing to fix is the overlap between labels for Marcus (because he won twice in the same day). This can be fixed using ggrepel package.
Here's an example of converting to "long" data to make ggplot easier. I also added a geom_jitter layer to make it easier to see days with overlaps.
riddle_log %>%
tidyr::gather(Analyst, Wins, -c(date, title)) %>%
ggplot(aes(x = date, y = Wins, color = Analyst)) +
geom_line() +
geom_jitter( width = 0, shape = 21, alpha = 0.7) + # one way to show daily overlap
scale_color_manual(name = "Analysts",
values = c("Brigid" = "blue", "Carly" = "red",
"Christian" = "orange", "Daniel" = "black",
"Jess" = "steelblue", "Luke" = "blue",
"Mara" = "blue", "Marcus" = "blue",
"Nassim" = "blue", "Nathalie" = "blue",
"Neil" = "blue"))

How to change node shape and colour in ggraph

The first image is a hand drawn image and the second image is the same graph drawn using ggraph.
This is the code that generates the graph.
library(igraph)
library(tidyverse)
library(ggraph)
V <- read.table(text = "x y
2 1
4 2
4 4
2 5
6 4
3 7
8 6",
header = T) %>%
rownames_to_column("name")
E <- matrix(c(0, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0), nrow = 7, byrow = T) %>%
data.frame() %>%
rename_all(list(function(x) 1:7)) %>%
rownames_to_column(var = "from") %>%
gather(to, val, 2:6) %>%
filter(val == 1) %>%
select(from, to)
nodeLables <- c(" I ", " N0", "F", "N1", "N2", "F1","F2")
g <- graph_from_data_frame(E, vertices = V, directed = F)
ggraph(g) +
geom_edge_link(edge_width = 1.3) +
geom_node_label(aes(label = nodeLables),label.r = unit(0.75, "lines"),
label.size = 0.65, label.padding = unit(0.55,"lines"), show.legend = F) +
ggtitle("My plot") +
coord_flip() +
expand_limits(x = 0, y = 0) +
# Using scale_x_reverse and swapping the limits
scale_x_reverse(expand = c(0, 0), limits = c(9, 0), breaks = c(0:9), minor_breaks = NULL) +
# switching y position to "right" (pre-flip)
scale_y_continuous(expand = c(0, 0),limits = c(0, 9), breaks = c(0:9), minor_breaks = NULL, position = "right") +
theme_minimal()
In the second graph the nodes are not prefect circles and the curvature changes with the size of the label. I want to make the nodes perfect circles and assign colours for different types of nodes. For example
I - Blue
F - Red
Nx - Green
Fx - Orange
I've played around with geom_node_point and geom_node_text and got the below result. However, I can't increase the node size further. How can I increase the node size?
nodeColours <- c("blue", "green", "red","green", "green", "orange","orange")
ggraph(g) +
geom_edge_link(edge_width = 1.3) +
geom_node_point(aes(size = 4), color = nodeColours)+
geom_node_text(aes(label = nodeLables))+
ggtitle("My plot") +
coord_flip() +
expand_limits(x = 0, y = 0) +
scale_x_reverse(expand = c(0, 0), limits = c(9, 0), breaks = c(0:9), minor_breaks = NULL) +
scale_y_continuous(expand = c(0, 0),limits = c(0, 9), breaks = c(0:9), minor_breaks = NULL, position = "right") +
theme_minimal()

How to flip x and y axes in ggraph R

The first image is a hand drawn (using MS word) image of a graph. The second image is an attempt to generate the same graph using ggraph.
Following is the code I used to automatically draw the graph when given the node-edge connections (the code was adopted from this thread). I want to move the x and y axis of the ggraph as shown in the hand drawn graph (image 1). Reverse the x axis numbering from top to bottom and move the y axis to the top. How do I go about it?
library(igraph)
library(tidyverse)
library(ggraph)
V <- read.table(text = "x y
2 1
4 2
4 4
2 5
6 4
3 7
8 6",
header = T) %>%
rownames_to_column("name")
E <- matrix(c(0, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0), nrow = 7, byrow = T) %>%
data.frame() %>%
rename_all(list(function(x) 1:7)) %>%
rownames_to_column(var = "from") %>%
gather(to, val, 2:6) %>%
filter(val == 1) %>%
select(from, to)
g <- graph_from_data_frame(E, vertices = V, directed = F)
png("C:\\Users\\Yasoda\\Downloads\\rplot.png", width = 450, height = 450)
ggraph(g) +
geom_edge_link(edge_width = 1.3) +
geom_node_label(aes(label = name),label.r = unit(0.75, "lines"),
label.size = 0.65, label.padding = unit(0.55,"lines"), show.legend = F) +
ggtitle("My plot") +
coord_flip() +
expand_limits(x = 0, y = 0) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 9), breaks = c(0:9), minor_breaks = NULL) +
scale_y_continuous(expand = c(0, 0),limits = c(0, 9), breaks = c(0:9), minor_breaks = NULL) +
theme_minimal()
dev.off()
I tried using scale_x_reverse() but it distorts the layout and gives the warning "Scale for 'x' is already present.Adding another scale for 'x', which will replace the existing scale.". Also I tried the position = "top" option in the scale_y_continuous and it doesn't make a difference either.
ggraph(g) +
geom_edge_link(edge_width = 1.3) +
geom_node_label(aes(label = name),label.r = unit(0.75, "lines"),
label.size = 0.65, label.padding = unit(0.55,"lines"), show.legend = F) +
ggtitle("My plot") +
coord_flip() +
expand_limits(x = 0, y = 0) +
# Using scale_x_reverse and swapping the limits
scale_x_reverse(expand = c(0, 0), limits = c(9, 0), breaks = c(0:9), minor_breaks = NULL) +
# switching y position to "right" (pre-flip)
scale_y_continuous(expand = c(0, 0),limits = c(0, 9), breaks = c(0:9), minor_breaks = NULL, position = "right") +
theme_minimal()

Order histograms in ggplot by proportion depending only on the 'yes' value of another variable in R

I have data which looks like this
df <- data.frame (
cancer = c(1, 0, 1, 0, 0, 1, 0, 0, 0, 0),
CVD = c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
diab = c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0),
stroke = c(0, 1, 1, 0, 1, 0, 0, 0, 1, 0),
asthma = c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0),
SR_hlt = c(1, 2, 2, 2, 1, 1, 2, 2, 2, 1))
What I want to do is produce a bar plot, only for the people who have the disease of interest, where the bars of the bar plot are ordered by the proportion of people whose SR_hlt == 1.
To make this plot, I use the following code
1) Gather the data
df_grp <- df %>%
gather(key = condition, value = Y_N, -SR_hlt) %>%
group_by(condition, Y_N, SR_hlt) %>%
summarise(count = n()) %>%
mutate(freq = round(count/sum(count) * 100, digits = 1))
2) Plot this data
df_plot <- df_grp %>%
filter(Y_N == 1) %>%
ggplot(aes(x = reorder(condition, -freq), y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
The x = reorder(condition, -freq) should be the thing which orders the bars, but I don't think this is working in this case, because the freq values are dependent on the value of a third variable, SR_hlt. Is it possible to order the bars by the value of freq when the value of SR_hlt == 1?
This can be accomplished using the handy package forcats, specifically fct_reorder2
df_plot <- df_grp %>%
filter(Y_N == 1) %>%
ggplot(aes(x = fct_reorder2(condition, SR_hlt, -freq),
y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
This is setting condition as a factor, and since SR_hlt == 1 is of interest, we arrange from low to high for SR_hlt, followed by -freq, or from high to low for freq.
Alternatively, you can set the factor before the ggplot call using standard dplyr only:
df_plot <- df_grp %>%
ungroup() %>%
filter(Y_N == 1) %>%
arrange(SR_hlt, desc(freq)) %>%
mutate(condition = factor(condition, unique(condition))) %>%
ggplot(aes(x = condition, y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
In the above, I use arrange to sort the dataframe for highest freq for SR_hlt. Next, I use mutate to take advantage of the sorted dataframe by factoring condition in the order of appearance.

Resources