Why does ggplot2 allow grouping by invalid aesthetic mappings? - r

I'm having trouble understanding how grouping works in ggplot. Suppose I have a dataframe with date, value, categorical, and a dichotomous variable like so:
library(ggplot2)
set.seed(42)
data <- data.frame(date = sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), by="day"), 20),
values = rnorm(20, 0, 1),
categories = sample(c("a", "b", "c"), 20, replace = TRUE))
data$pre_post <- ifelse(data$date <= '2018/01/01', "pre", "post")
If I group by the 'pre_post' variable (dichotomous) using aes(group =), I get:
ggplot(data, aes(x = date, y = values, color = categories, group = pre_post)) +
geom_line()
Using aes(by =), the results change to:
ggplot(data, aes(x = date, y = values, color = categories, by = pre_post)) +
geom_line()
In my current use case, 'by=' gives me the desired results but I don't know how to explain what exactly it is doing and why. Especially since, as #markus points out that 'by=' isn't even a valid aesthetic and that replacing it with 'foo=' would do the same thing:
ggplot(data, aes(x = date, y = values, color = categories, foo = pre_post)) +
geom_line()

If I understand the examples and explanations in Aesthetics: grouping of the ggplot2 documention correctly, the group aesthetic maps a different line for each subject.
group = pre_post seems to take precedence over the grouping by the color aesthetic enforcing to draw a single line connecting all "pre" data points and another line connecting all "post" data points.
In order to get six different lines for each combination of categories and pre_post you may follow Axeman's suggestion to use interaction():
ggplot(data, aes(x = date, y = values, color = categories, group = interaction(categories, pre_post))) +
geom_line()
or you may use a different aesthetic like linetype:
ggplot(data, aes(x = date, y = values, color = categories, linetype = pre_post)) +
geom_line()
Using an undefined aesthetic like by or foo causes the data to be grouped as well but without effect on the aesthetical appearance.
However, grouping a line chart may lead to a loss of information as can be seen here:
ggplot(data, aes(x = date, y = values, color = categories)) +
geom_line()
Without grouping by pre_post the blue line extends into 2018 which was not visible on the previous plots.
To avoid the loss of information I prefer to plot the data points as well as the lines in order to mark the end points of the line segments or to show a single data point:
ggplot(data, aes(x = date, y = values, color = categories, linetype = pre_post, shape = pre_post)) +
geom_line() +
geom_point() +
geom_label(aes(label = date), data = data[data$categories == "c" & data$pre_post == "post", ],
hjust = -0.1, vjust = -0.1, show.legend = FALSE)
geom_label() is optional and only used in this post to highlight the single data point.

Related

stack bars by an ordering variable which is numeric ggplot

I am trying to create a swimlane plot of different subjects doses over time. When I run my code the bars are stacked by amount of dose. My issue is that subjects doses vary they could have 5, 10 , 5 in my plot the 5's are stacked together. But I want the represented as they happen over time. In my data set I have the amount of time each patient was on a dose for ordered by when they had the dose. I want by bars stacked by ordering variable called "p" which is numeric is goes 1,2,3,4,5,6 etc which what visit the subject had that dose.
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(EXDOSE))) +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
I want the bars stacked by my variable "p" not by fill
I tried forcats but that does not work. Unsure how to go about this the data in the dataset is arranged by p for each subject
example data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "1034","1034","1034","1034"),
exdose = c(5,10,20,5,5,10,20,20),
p= c(1,2,3,4,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5)
)
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(exdose)),position ="stack") +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
If you want to order your stacked bar chart by p you have to tell ggplot2 to do so by mapping p on the group aesthetic. Otherwise ggplot2 will make a guess which by default is based on the categorical variables mapped on any aesthetic, i.e. in your case the fill aes:
Note: I dropped the scale_fill_manual as you did not provide the vector of colors. But that's not important for the issue.
library(ggplot2)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)))
EDIT And to get the right order we have to reverse the order of the stack which could be achieved using position_stack(reverse = TRUE):
Note: To check that we have the right order I added a geom_text showing the p value.
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE)) +
geom_text(aes(label = p), position = position_stack(reverse = TRUE))
Second option would be to convert p to a factor which the order of levels set in the reverse order:
ggplot(dataset, aes(x = diff + 1, y = subject, group = factor(p, rev(sort(unique(p)))))) +
geom_col(aes(fill = as.factor(exdose))) +
geom_text(aes(label = p), position = "stack")

Time series data using ggplot: how use different color for each time point and also connect with lines data belonging to each subject?

I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing.
Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?...
Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot:
df$condition = as.factor(df$condition)
df$cell = as.factor(df$cell)
df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3")
windows(width=8,height=5)
ggplot(df, aes(x=condition, y=test_variable, color=condition)) +
labs(title="", x = "Condition", y = "test_variable", color="Condition") +
geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete.
Altogether, you can do something like that:
library(ggplot2)
ggplot(df, aes(x = condition, y = variable, group = Cell)) +
geom_point(aes(color = condition))+
geom_line(aes(color = condition))+
scale_x_discrete(limits = c("before1","before2","after1","after2","after3"))
Does it look what you are expecting ?
Data
df = data.frame(Cell = c(rep("13a",5),rep("1b",5)),
condition = rep(c("before1","before2","after1","after2","after3"),2),
variable = c(58,55,36,29,53,57,53,54,52,52))

How to create a heatmap with continuous scale using ggplot2 in R

I have got a data frame with several 1000 rows in the form of
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
and would like to make a kind of heatmap in which one axes has a continuous scale (position). The color column is categorical. However due to the large amount of data points I want to use binning, i.e. use it as a continuous variable.
This is more or less how the plot should look like:
I can't think of a way to create such a plot using ggplot2/R. I have tried several geometries, e.g. geom_point()
ggplot(data=df, aes(x=strain, y=pos, color=color)) +
geom_point() +
scale_colour_gradientn(colors=c("yellow", "black", "orange"))
Thanks for your help in advance.
Does this help you?
library(ggplot2)
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))
Looks like this
Improved version with 3 color gradient if you like
library(scales)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))+ scale_fill_gradientn(colours=c("orange","black","yellow"),values=rescale(c(1, 2, 3)),guide="colorbar")

ggplot2, error in filling the area under lines

I have this data set and I want to fill the area under each line. However I get an error saying:
Error: stat_bin() must not be used with a y aesthetic.
Additionally, I need to use alpha value for transparency. Any suggestions?
library(reshape2)
library(ggplot2)
dat <- data.frame(
a = rnorm(12, mean = 2, sd = 1),
b = rnorm(12, mean = 4, sd = 2),
month = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"))
dat$month <- factor(dat$month,
levels = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"),
ordered = TRUE)
dat <- melt(dat, id="month")
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(stat ="bin")
I want to fill the area under each line
This means we will need to specify the fill aesthetic.
I get an error saying "Error: stat_bin() must not be used with a y aesthetic."
This means we will need to delete your stat ="bin" code.
Additionally, I need to use alpha value for transparency.
This means we need to put alpha = <some value> in the geom_area layer.
Two other things: (1) since you have a factor on the x-axis, we need to specify a grouping so ggplot knows which points to connect. In this case we can use variable as the grouper. (2) The default "position" of geom_area is to stack the areas rather than overlap them. Because you ask about transparency I assume you want them overlapping, so we need to specify position = 'identity'.
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(aes(fill = variable, group = variable),
alpha = 0.5, position = 'identity')
To get lines across categorical variables, use the group aesthetic:
ggplot(data = dat, aes(x = month, y = value, colour = variable, group = variable)) +
#geom_line(position = 'stack') + # redundant, but this is where lines are drawn
geom_area(alpha = 0.5)
To change the color inside, use the fill aesthetic.

R: prevent break in line showing time series data using ggplot geom_line

Using ggplot2 I want to draw a line that changes colour after a certain date. I expected this to be be simple, but I get a break in the line at the point the colour changes. Initially I thought this was a problem with group (as per this question; this other question also looked relevant but wasn't quite what I needed). Having messed around with the group aesthetic for 30 minutes I can't fix it so if anybody can point out the obvious mistake...
Code:
require(ggplot2)
set.seed(1111)
mydf <- data.frame(mydate = seq(as.Date('2013-01-01'), by = 'day', length.out = 10),
y = runif(10, 100, 200))
mydf$cond <- ifelse(mydf$mydate > '2013-01-05', "red", "blue")
ggplot(mydf, aes(x = mydate, y = y, colour = cond)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()
If you set group=1, then 1 will be used as the group value for all data points, and the line will join up.
ggplot(mydf, aes(x = mydate, y = y, colour = cond, group=1)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()

Resources