How to connect points of different groups by a line using ggplot - r

df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")), value=c(10,15,25,15))
df
adjuster date value
1 Mary 2012-01-01 10
2 Mary 2012-02-01 15
3 Bob 2012-03-01 25
4 Bob 2012-04-01 15
ggplot(df,aes(x=date,y=value,color=adjuster))+geom_line()+geom_point()
In the above graph, notice the disconnect between the February and March points. How do I connect those points with a blue line, leaving the actual March point red? In other words, Bob should be associated with the value from [Jan - Mar) and Mary from [Mar-Apr].
EDIT: Turns out my example was overly simple. The answers listed don't generalize to the case where the adjuster changes between two people on more than one occasion. For example, consider
df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob","Mary"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1","2012-5-1")), value=c(10,15,25,15,20))
adjuster date value
1 Mary 2012-01-01 10
2 Mary 2012-02-01 15
3 Bob 2012-03-01 25
4 Bob 2012-04-01 15
5 Mary 2012-05-01 20
Since I didn't mention this in my original question, I'll pick an answer that simply worked for my original data.

Updated to minimise tinkering with data.frame, added the group = 1 argument
Tinkered around with your data.frame a little. You should be able to automate the tinkering around, I guess. Let me know if you aren't. Also, your ggplot command wasn't working as per the chart you've posted in the question
df<-data.frame(
adjuster=c("Mary","Mary","Bob","Bob"),
date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")),
value=c(10,15,25,15)
)
library(data.table)
library(ggplot2)
dt <- data.table(df)
dt[,adjuster := as.character(adjuster)]
dt[,prevadjuster := c(NA,head(adjuster,-1))]
dt[is.na(prevadjuster),prevadjuster := adjuster]
ggplot(dt) +
geom_line(aes(x=date,y=value, color = prevadjuster, group = 1)) +
geom_line(aes(x=date,y=value, color = adjuster, group = 1)) +
geom_point(aes(x=date,y=value, color = adjuster, group = 1))

I'd like to put forward a solution that does not require modifying the dataframe, that is intuitive (once you think about how the layers are drawn), and does not involve lines overwriting one another. It does, however, have one problem: it does not allow you to modify the linetype. I do not know why that is, so if someone could chime in to enlighten us, it would be great.
Quick answer to the OP:
ggplot(df, aes(x = date, y = value, color = adjuster))+
geom_line(aes(group = 1, colour = adjuster))+
geom_point(aes(group = adjuster, color = adjuster, shape = adjuster))
In the OP's dataframe, one can use group=1 to create a group spanning the whole period.
An example illustrated with figures:
# Create data
df <- structure(list(year = c(1990, 2000, 2010, 2020, 2030, 2040),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Something", class = "factor"),
value = c(4, 5, 6, 7, 8, 9), category = structure(c(1L, 1L, 1L,
2L, 2L, 2L), .Label = c("Observed", "Projected"), class = "factor")), .Names = c("year",
"variable", "value", "category"), row.names = c(NA, 6L), class = "data.frame")
# Load library
library(ggplot2)
The basic plot, similar to the OP, groups data by category both inside geom_point(aes()) and inside geom_line(aes()), with the undesirable result, in this application, that the line does not 'bridge' the two points across the two categories.
# Basic ggplot with geom_point() and geom_line()
p <- ggplot(data = df, aes(x = year, y = value, group = category)) +
geom_point(aes(colour = category, shape = category), size = 4) +
geom_line(aes(colour = category), size = 1)
ggsave(p, file = "ggplot-points-connect_p1.png", width = 10, height = 10)
The key to my solution is to group by variable but to colour by categoryinside geom_line(aes())
# Modified version to connect the dots "continuously" while preserving color grouping
p <- ggplot(data = df, aes(x = year, y = value)) +
geom_point(aes(group = category, colour = category, shape = category), size = 4) +
geom_line(aes(group = variable, colour = category), size = 1)
ggsave(p, file = "ggplot-points-connect_p2.png", width = 10, height = 10)
However, sadly, with this approach it is not currently possible to control the linetype, as far as I can make out:
ggplot(data = df, aes(x = year, y = value)) +
geom_point(aes(group = category, colour = category, shape = category), size = 4) +
geom_line(aes(group = variable, colour = category), linetype = "dotted", size = 1)
## Error: geom_path: If you are using dotted or dashed lines, colour, size and linetype must be constant over the line
Remark: I'm using another dataframe because I'm copy-pasting from something I was doing and that made me visit this question -- this way I can upload my images.

Here is a simple solution. No need to change the original data.frame.
ggplot()+
geom_line(aes_string(x='date',y='value'), data=df, lty=2)+
geom_point(aes_string(x='date',y='value', color='adjuster'), data=df)+
geom_line(aes_string(x='date',y='value', color='adjuster'), data=df)
That's one of my favorite features of ggplot. You can layer your plots one on top of the other pretty cleanly.
Here is the result:

I came up with a solution that combines ideas from Codoremifa and JAponte.
df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")), value=c(10,15,25,15))
df$AdjusterLine<-df$adjuster
df[2:nrow(df),]$AdjusterLine<-df[1:(nrow(df)-1),]$adjuster
ggplot(df)+geom_line(aes(x=date,y=value, color=AdjusterLine), lty=2)+geom_line(aes(x=date,y=value, color=adjuster))+geom_point(aes(x=date,y=value, color=adjuster))

Related

Adding geom_line between data points with different geom_boxplot fill variable

Hi I have a much larger data frame but a sample dummy df is as follows:
set.seed(23)
df = data.frame(name = c(rep("Bob",8),rep("Tom",8)),
topic = c(rep(c("Reading","Writing"),8)),
subject = c(rep(c("English","English","Spanish","Spanish"),4)),
exam = c(rep("First",4),rep("Second",4),rep("First",4),rep("Second",4)),
score = sample(1:100,16))
I have to plot it in the way shown in the picture below (for my original data frame) but with lines connecting the scores corresponding to each name between the first and second class in the exam variable, I tried geom_line(aes(group=name)) but the lines are not connected in the right way. Is there any way to connect the points that also respects the grouping by the fill variable similar to how the position_dodge() helps separate the points by their fill grouping? Thanks a lot!
library(ggplot2)
df %>% ggplot(aes(x=topic,y=score,fill=exam)) +
geom_boxplot(outlier.shape = NA) +
geom_point(size=1.75,position = position_dodge(width = 0.75)) +
facet_grid(~subject,switch = "y")
One option to achieve your desired result would be to group the lines by name and topic and do the dodging of lines manually instead of relying on position_dogde. To this end convert topic to a numeric for the geom_line and shift the position by the necessary amount to align the lines with the dodged points:
set.seed(23)
df <- data.frame(
name = c(rep("Bob", 8), rep("Tom", 8)),
topic = c(rep(c("Reading", "Writing"), 8)),
subject = c(rep(c("English", "English", "Spanish", "Spanish"), 4)),
exam = c(rep("First", 4), rep("Second", 4), rep("First", 4), rep("Second", 4)),
score = sample(1:100, 16)
)
library(ggplot2)
ggplot(df, aes(x = topic, y = score, fill = exam)) +
geom_boxplot(outlier.shape = NA) +
geom_point(size = 1.75, position = position_dodge(width = 0.75)) +
geom_line(aes(
x = as.numeric(factor(topic)) + .75 / 4 * ifelse(exam == "First", -1, 1),
group = interaction(name, topic)
)) +
facet_grid(~subject, switch = "y")

display mean value (rearrange data frame?)

I want to boxplot two groups (A and B) and display the mean value on each box plot.
I have 30 lines and 2 columns : each line contains the value of group A (col 1) and group B (col 2).
I did a boxplot with graphic boxplot
boxplot(Data_Q4$Group.A,Data_Q4$Group.B,names=c("group A","group B"))
but it seems like adding a mean point on the boxplot necessiting ggplot 2.
I tried many things but it already send me an error message
! Aesthetics must be either length 1 or the same as the data (30): x...
It seems my problem come from y axis. I need him to take the data from columns A and B but I don't know how to do this.
if my data was with value column and group columns (A or B for each line) it would work but I don't know how to rearrange it so that I get 2 columns (value and groups) and 60 lines with the values of the groups.
and then I do dataQ4 %>% ggplot(aes(x=group,y=value))+geom_boxplot+stat_summary(fun.y=mean)
I think it will be ok.
so my problem is to rearrange my data frame so that I can use ggplot and boxplot it
thanks for your help !
I share here my data :
dput(Data_Q4) structure(list(Group.A = c(1.25310535, 0.5546414, 0.301283, 1.29312466, 0.99455579, 0.5141743, 2.0078324, 0.42224244, 2.17877257, 3.21778902, 0.55782935, 0.59461765, 0.97739581, 0.20986658, 0.30944786, 1.10593627, 0.77418776, 0.08967408, 1.10817666, 0.24726425, 1.57198685, 4.83281274, 0.43113213, 2.73038931, 1.13683142, 0.81336825, 0.83700649, 1.7847654, 2.31247163, 2.90988727), Group.B = c(2.94928948, 0.70302878, 0.69016263, 1.25069011, 0.43649776, 0.22462232, 0.39231981, 1.5763435, 0.42792839, 0.19608026, 0.37724368, 0.07071508, 0.03962611, 0.38580831, 2.63928857, 0.78220807, 0.66454197, 0.9568569, 0.02484568, 0.21600677, 0.88031195, 0.13567357, 0.68181725, 0.20116062, 0.4834762, 0.50102846, 0.15668497, 0.71992076, 0.68549794, 0.86150777)), class = "data.frame", row.names = c(NA, -30L))
First I create some random data:
df <- data.frame(group = rep(c("A", "B"), 15),
value = runif(30, 0, 10))
You can use the following code:
library(tidyverse)
ggplot(data = df,
aes(x = group, y = value)) +
geom_boxplot() +
stat_summary(fun.y = mean, color = "darkred", position = position_dodge(0.75),
geom = "point", shape = 18, size = 3,
show.legend = FALSE)
Output:
The red dots represent the mean.
Using your data:
You can use the following code:
library(tidyverse)
library(reshape)
dataQ4 %>%
melt() %>%
ggplot(aes(x = variable, y = value)) +
geom_boxplot() +
stat_summary(fun.y = mean, color = "darkred", position = position_dodge(0.75),
geom = "point", shape = 18, size = 3,
show.legend = FALSE)
Output:

How to specify unique geom assignments to facets?

Below I have simulated a dataset where an assignment was given to 5 groups of individuals on 5 different days (a new group with 200 new individuals each day). TrialStartDate denotes the date on which the assignment was given to each individual (ID), and TrialEndDate denotes when each individual finished the assignment.
set.seed(123)
data <-
data.frame(
TrialStartDate = rep(c(sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by="day"), 5)), each = 200),
TrialFinishDate = sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by = "day"), 1000,replace = T),
ID = seq(1,1000, 1)
)
I am interested in comparing how long individuals took to complete the trial depending on when they started the trial (i.e., assuming TrialStartDate has an effect on the length of time it takes to complete the trial).
To visualize this, I want to make a barplot showing counts of IDs on each TrialFinishDate where bars are colored by TrialStartDate (since each TrialStartDate acts as a grouping variable). The best I have come up with so far is by faceting like this:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
facet_wrap(~TrialStartDate, ncol = 1)
However, I also want to add a vertical line to each facet showing when the TrialStartDate was for each group (preferably colored the same as the bars). When attempting to add vertical lines with geom_vline, it adds all the lines to each facet:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(xintercept = unique(data$TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
How can we make the vertical lines unique to the respective group in each facet?
You're specifying xintercept outside of aes, so the faceting is not respected.
This should do the trick:
data %>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(aes(xintercept = TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
Note geom_vline(aes(xintercept = TrialStartDate))

Replicating a trending chart with ggplot

I recently saw a chart I want to replicate in R. The chart shows a score or other measurement for multiple records as a colored box, binned into one of, say, 4 colors. In my image it is red, light red, light green, and green. So each record gets one box for each score they have - the idea is that each record had one score for a given point in time over several points in time. In my example, I'll use student test scores over time, so say we have 4 students and 8 tests throughout the year (in chronological order) we would have 8 boxes for each student, resulting in 32 boxes. Each row (student) would have 8 boxes.
Here is how I created some example data:
totallynotrealdata <- data.frame(Student = c(rep("A",8),rep("B",8),rep("C",8),rep("D",8)),Test = rep(1:8,4), Score = sample(1:99,32,replace = TRUE), BinnedScore = cut(totallynotrealdata$TB,breaks = c(0,25,50,75,100),labels = c(1,2,3,4)))
What I'm wondering is how I can recreate this chart in ggplot? Any geoms I should look at?
You could play with geom_rect(). This is very basic but I guess you can easily optimize it for your purposes:
df <- data.frame(Student = c(rep(1,8),rep(2,8),rep(3,8),rep(4,8)),
Test = rep(1:8,4),
Score = sample(1:99,32,replace = TRUE))
df$BinnedScore <- cut(df$Score,breaks = c(0,25,50,75,100),labels = c(1,2,3,4))
df$Student <- factor(df$Student, labels = LETTERS[1:length(unique(df$Student))])
library(ggplot2)
colors <- c("#f23d2e", "#e39e9c", "#bbd3a8", "#68f200")
numStuds <- length(levels(df$Student))
numTests <- max(df$Test)
ggplot() + geom_rect(data = df, aes(xmin = Test-1, xmax = Test, ymin = as.numeric(Student)-1, ymax = as.numeric(Student)), fill = colors[df$BinnedScore], col = grey(0.5)) +
xlab("Test") + ylab("Student") +
scale_y_continuous(breaks = seq(0.5, numStuds, 1), labels = levels(df$Student)) +
scale_x_continuous(breaks = seq(0.5, numTests, 1), labels = 1:numTests)

Overlaying whiskers or error-bar-esque lines on a ggplot

I am creating plots similar to the first example image below, and need plots like the second example below.
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major)
The data.2014 has only values for the "Findings" group. I would like to show those 2014 Findings values on the plot, on the appropriate/corresponding data.2015$area, where there is 2014 data available.
To show last year's data just on the "Finding" (red bars) data, I'd like to use a one-sided errorbar/whisker that emanates from the value of the relevant data.2015 bar, and terminates at the data.2014 value, for example:
I thought to do this by using layers and plotting error bars so that the 2015 data could overlap, however this doesn't work when the 2014 result is abs() smaller than the 2015 result and is thus occluded.
Considerations:
I'd like the errorbar/whisker to be the same width as the bars, perhaps even dashed line with a solid cap.
Bonus points for a red line when the value has decreased, and green when the value has increased
I generate lots of these plots in a loop, sometimes with many groups, with a different amount of areas in each plot. The 2014 data is (at this stage) always displayed only for a single group, and every area has some data (except for just one NA case, but need to provision for that scenario)
EDIT
So I've added to the below solution, I used that exact code but instead used the geom_linerange so that it would add lines without the caps, then I also used the geom_errorbar, but with ymin and ymax set to the same value, so that the result is a one-sided error bar in ggplot geom_bar! Thanks for the help.
I believe you can get most of what you want with a little data manipulation. Doing an outer join of the two datasets will let you add the error bars with the appropriate dodging.
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
To make the error bar one-sided, you'll want ymin to be either the same as y or NA depending on the group. It seemed easiest to make a new variable, which I called plotscore, to achieve this.
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
The last thing I did is to make a variable direction for when the 2015 score decreased vs increased compared to 2014. I included a third category for the Benchmark group as filler because I ran into some issues with the dodging without it.
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
The dataset used for plotting would look like this:
area group score.2015 score.2014 plotscore direction
1 first Benchmark -40 NA NA absent
2 first Findings -50 -30 -50 dec
3 second Benchmark -10 NA NA absent
4 second Findings 20 40 20 dec
5 third Benchmark 60 NA NA absent
6 third Findings 15 -15 15 inc
The final code I used looked like this:
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(aes(ymin = plotscore, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))
I'm using the development version of ggplot2, ggplot2_1.0.1.9002, and show_guide is now deprecated in favor of show.legend, which I used in geom_errorbar.
I obviously didn't change the line type of the error bars to dashed with a solid cap, nor did I remove the bottom whisker as I don't know an easy way to do either of these things.
In response to a comment suggesting I add the full solution as an answer:
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# reconfigure data to create values for the additional errorbar/linerange
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
# set the data min and max as the same to have a single 'cap' with no line
geom_errorbar(aes(ymin = score.2014, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
#then add the line
geom_linerange(aes(ymin = score.2015, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))

Resources