Making smart multilevel histograms - r

I'm using RStudio Version 0.98.1028 on windows.
I'd like to make multilevel histogram using ggplot2. Let's say I have a 4D data frame like this:
facet <- as.factor(rep(c('alpha', 'beta', 'gamma'), each = 4, times = 3))
group <- as.factor(rep(c('X', 'Y'), each = 2, times = 9))
type <- as.factor(rep(c('a', 'b'), each = 1, times = 18))
day <- as.factor(rep(1:3, each = 12)
df = data.frame(facet = facet, group = group, type = type, day = day, value = abs(rnorm(36)))
I'd like to make histograms of x = day vs y = value in 3 facets, corresponding to facet, grouping by group and filling by type. In other words I'd like to pile up a and b in a single bar, but keeping separated bars for X and Y. It would look something like
g = ggplot(df, aes(day, value, group = group, fill = type))
g + geom_histogram(stat = 'identity', position = 'dodge') +
facet_grid(facet ~ .)
Unfortunately with the dodge option I get unstacked histograms, while without I get 4 bars at each day. Any idea on how to solve this problem?
Using excel one facet should look something like this
Thanks in advance!
EB

Well, maybe your question is related to this one on the ggplot group.
A possible solution is the following:
g = ggplot(df, aes(group, value, fill = type))
g + geom_bar(stat = 'identity', position = 'stack') +
facet_grid(facet ~ day)
It's suboptimal because you are using two facets, but in this way you obtain this figure:

As pointed out by #Matteo your specific wish is probably not directly achievable with the tooling provided by ggplot2. A little bit of hacking provided below which may point in the right direction - I am not endorsing it too much but I just spent a couple of minutes playin, around with it. Maybe you can pick up a few of the elements.
I combined group and day into a single factor and when plotting replaced the x-labels manually with the (non-unique) group names. I then included (in a lazy manner) day labels. I still feel day x facet is the way you should proceed.
df$combinedCategory <- as.factor(paste(df$day,df$group))
library(scales)
g = ggplot(df, aes(combinedCategory, value, fill = type))
g = g + geom_bar(stat='identity',position = 'fill')
g = g + facet_grid(facet ~ .)
g = g + scale_y_continuous(labels = percent)
g = g + scale_x_discrete(labels = c("X","Y"))
g = g + geom_text(aes(x=1.5,y=0.05, label="Day 1"))
g = g + geom_text(aes(x=3.5,y=0.05, label="Day 2"))
g = g + geom_text(aes(x=5.5,y=0.05, label="Day 3"))
g = g + theme_minimal()
g
This give the following:

Indeed it is sufficient to set y = interaction(group, day) in aes(). This was actually my first step, but I was wondering if something more precise existed. Apparently not: the only tricky point here is to create a 2nd level x-axis labels row. Thanks everybody!

Related

adding a line to a ggplot boxplot

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

How do I correctly connect data points ggplot

I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.
Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path

Time series data using ggplot: how use different color for each time point and also connect with lines data belonging to each subject?

I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing.
Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?...
Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot:
df$condition = as.factor(df$condition)
df$cell = as.factor(df$cell)
df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3")
windows(width=8,height=5)
ggplot(df, aes(x=condition, y=test_variable, color=condition)) +
labs(title="", x = "Condition", y = "test_variable", color="Condition") +
geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete.
Altogether, you can do something like that:
library(ggplot2)
ggplot(df, aes(x = condition, y = variable, group = Cell)) +
geom_point(aes(color = condition))+
geom_line(aes(color = condition))+
scale_x_discrete(limits = c("before1","before2","after1","after2","after3"))
Does it look what you are expecting ?
Data
df = data.frame(Cell = c(rep("13a",5),rep("1b",5)),
condition = rep(c("before1","before2","after1","after2","after3"),2),
variable = c(58,55,36,29,53,57,53,54,52,52))

Add pair lines in R

I have some data measured pair-wise (e.g. 1C, 1M, 2C and 2M), which I have plotted separately (as C and M). However, I would like to add a line between each pair (e.g. a line from point 1 in the C column to point 1 in the M 'column').
A small section of the entire dataset:
PairNumber Type M
1 M 0.117133
2 M 0.054298837
3 M 0.039734
4 M 0.069247069
5 M 0.043053957
1 C 0.051086898
2 C 0.075519
3 C 0.065834198
4 C 0.084632915
5 C 0.054254946
I have generated the below picture using the following tiny R snippet:
boxplot(test$M ~ test$Type)
stripchart(test$M ~ test$Type, vertical = TRUE, method="jitter", add = TRUE, col = 'blue')
Current plot:
I would like to know what command or what function I would need to achieve this (a rough sketch of the desired result, with only some of the lines, is presented below).
Desired plot:
Alternatively, doing this with ggplot is also fine by me, I have the following alternative ggplot code to produce a plot similar to the first one above:
ggplot(,aes(x=test$Type, y=test$M)) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(position=position_jitter(width=.1, height=0))
I have been trying geom_path, but I have not found the correct syntax to achieve what I want.
I would probably recommend breaking this up into multiple visualizations -- with more data, I feel this type of plot would become difficult to interpret. In addition, I am not sure it's possible to draw the geom_lines and connect them with the additional call to geom_jitter. That being said, this gets you most of the way there:
ggplot(df, aes(x = Type, y = M)) +
geom_boxplot(outlier.shape = NA) +
geom_line(aes(group = PairNumber)) +
geom_point()
The trick is to specify your group aesthetic within geom_line() and not up top within ggplot().
Additional Note: No reason to fully qualify your aesthetic variables within ggplot() -- that is, no reason to do ggplot(data = test, aes(x = test$Type, y = test$M); rather, just use: ggplot(data = test, aes(x = Type, y = M)).
UPDATE
Leveraging cowplot to visualize this data in different plots could prove helpful:
library(cowplot)
p1 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_boxplot()
p2 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_jitter(position = position_jitter(width = 0.1, height = 0))
p3 <- ggplot(df, aes(x = M, color = Type, fill = Type)) +
geom_density(alpha = 0.5)
p4 <- ggplot(df, aes(x = Type, y = M)) +
geom_line(aes(group = PairNumber, color = factor(PairNumber)))
plot_grid(p1, p2, p3, p4, labels = c(LETTERS[1:4]), align = "v")

Density plot in ggplot [duplicate]

In the dataframe below, I would expect the y axis values for density be 0.6 and 0.4, yet they are 1.0. I feel there is obviously something extremely basic that I am missing about the way I am using ..density.. but am brain freezing. How would I obtain the desired behavior using ..density.. Any help would be appreciated.
df <- data.frame(a = c("yes","no","yes","yes","no"))
m <- ggplot(df, aes(x = a))
m + geom_histogram(aes(y = ..density..))
Thanks,
--JT
As per #Arun's comment:
At the moment, yes and no belong to different groups. To make them part of the same group set a grouping aesthetic:
m <- ggplot(df, aes(x = a , group = 1)) # 'group = 1' sets the group of all x to 1
m + geom_histogram(aes(y = ..density..))

Resources