Separated histograms in R - r

I'm using the following code to produce three different histograms in the same graph. However, I was wondering, if it is possible to separate the three different histograms in three different graphs underneath each other with the same scaling on the x-axis for all three graphs. As an alternative I thought about turning the three histograms into densities and still have them in the same graph.
require(ggplot2)
require(reshape2)
set.seed(1)
df <- data.frame(x = rnorm(n = 1000, mean = 2, sd = 0.2),
y = rnorm(n = 1000, mean = 2),
z = rnorm(n = 1000, mean = 2))
ggplot(melt(df), aes(value, fill = variable)) + geom_histogram(position = "dodge")+ scale_fill_manual(values = c('red','black','green'))
Thanks.

Try this:
gg <- melt(df)
ggplot(gg) + geom_bar(aes(x=value,fill=variable)) + facet_grid(variable~., scale.)
The function melt(...) transforms your data from "wide" format (values in different columns) to "long" format (valuesin one column, with an extra column indicating which value goes with which variable. This is a preferred format for ggplot. Then facet_grid(...) puts the different variables (x,y,z) into different graphs (or panels).
Use this for densities:
ggplot(gg) +
stat_density(aes(x=value, color=variable),geom="line",position="dodge")

Related

adding a line to a ggplot boxplot

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

Histogram with discontinuous x-axis

I need to realize an histogram in R. I add a picture to represent the desired results. I had tried to use both ggplot2 and the base function hist. I used this code (ggplot) to get the basic histogram, but I would like to add the option to set the x-axis as shown in the figure (exactly the same values). Can someone tell me how to do that?
My imput file DataLig2 contains a list of objects and for each of these is associated a value (N..of.similar..Glob.Sum...0.83..ligandable.pockets). I need to plot the frequencies of all the reported values. The lowest value is 1 and the highest is 28. There aren't values from 16 to 27 so I would like to skip thi range in my plot.
example of imput file:
Object;N..of.similar..Glob.Sum...0.83..ligandable.pockets
1b47_A_001;3
4re2_B_003;1
657w_H_004_13
1gtr_A_003;28
...
my script:
ggplot(dataLig2, aes(dataLig2$N..of.similar..Glob.Sum...0.83..ligandable.pockets, fill = group)) + geom_histogram(color="black") +
scale_fill_manual(values = c("1-5" = "olivedrab1",
"6-10" = "limegreen",
"11-28" = "green4"))
Can you also suggest a script with the hist base function to get the same graph (with spaced bars as in the figure shown)? Thank you!
Using ggplot, set x as factor, missing numbers as "...", and set to plot unused levels, see example:
library(ggplot2)
# reproducible example data
# where 8 and 9 is missing
set.seed(1); d <- data.frame(x = sample(c(1:7, 10), 100, replace = TRUE))
# add missing 8 and 9 as labels
d$x1 <- factor(d$x, levels = 1:10, labels = c(1:7, "...", "...", 10))
#compare
cowplot::plot_grid(
ggplot(d, aes(x)) +
geom_bar() +
ggtitle("before") +
scale_x_continuous(breaks = 1:10),
ggplot(d, aes(x = x1)) +
geom_bar() +
scale_x_discrete(drop = FALSE) +
ggtitle("after"))

violin_plot() with continuous axis for grouping variable?

The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)

Individual binwidths in faceted histogram on ggplot2

I do a series of histograms with facet_grid and I want every histogram in the grid to have the same number of classes, in the example below e.g. 6 classes. The problem in this example below is that binwidth = diff(range(x$data))/6) defines the classes according to the overall range of a, b and c, i.e. defines one binwidth for all three facets.
How do I define binwidth individually for the facets a, b and c?
require("ggplot2")
a <- c(1.21,1.57,1.21,0.29,0.36,0.29,0.93,0.26,0.28,0.48,
0.12,0.38,0.83,0.82,0.41,0.69,0.25,0.98,0.52,0.11)
b <- c(0.42,0.65,0.17,0.38,0.44,0.01,0.01,0.03,0.15,0.01)
c <- c(1.09,3.55,1.07,4.55,0.55,0.11,0.72,0.66,1.22,3.04,
2.01,0.64,0.47,1.33,3.44)
x <- data.frame(data = c(a,b,c), variable = c(rep("a",20),rep("b",10),rep("c",15)),area="random")
qplot(data, data = x, geom = "histogram", binwidth = diff(range(x$data))/6) +
facet_grid(area~variable, scales = "free")
This is not optimal but you can do the histogram in different layers:
ggplot(x, aes(x=data)) +
geom_histogram(data=subset(x, variable=="a"), binwidth=.1) +
geom_histogram(data=subset(x, variable=="b"), binwidth=.2) +
geom_histogram(data=subset(x, variable=="c"), binwidth=.5) +
facet_grid(area~variable, scales="free")
One way is to pre-summarize your data in the way you want it, then to create the plot.
In your case, you need to bin your variables using the function cut(). The package dplyr is convenient for this, because it allows you to specify a mutate function for each group of your data:
library(dplyr)
zz <- x %>%
group_by(variable) %>%
mutate(
bins = cut(data, breaks=6)
)
qplot(bins, data = zz, geom = "histogram", fill=I("blue")) +
facet_grid(area~variable, scales = "free") +
theme(axis.text.x = element_text(angle=90))

Plot two graphs in the same plot [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 4 years ago.
The solution with ggplot in this question worked really well for my data. However, I am trying to add a legend and everything that I tried does not work...
For example, in the ggplot example in the above question, how I can add a legend to show that the red curve is related to "Ocean" and the green curve is related to "Soil"? Yes, I want to add text that I will define and it is not related to any other variable in my data.frame.
The example below is some of my own data...
Rate Probability Stats
1.0e-04 1e-04 891.15
1.0e-05 1e-04 690
...
etc (it's about 400 rows). And I have two data frames similar to the above one.
So My code is
g <- ggplot(Master1MY, aes(Probability))
g <- g + geom_point(aes(y=Master1MY$Stats), colour="red", size=1)
g <- g + geom_point(aes(y=Transposon1MY$Stats), colour="blue", size=1)
g + labs(title= "10,000bp and 1MY", x = "Probability", y = "Stats")
The plot looks like
I just want a red and blue legend saying "Master" and "Transposon"
Thanks!
In ggplot it is generally most convenient to keep the data in a 'long' format. Here I use the function melt from the reshape2 package to convert your data from wide to long format. Depending how you specify different aesthetics (size, shape, colour et c), corresponding legends will appear.
library(ggplot2)
library(reshape2)
# data from the example you were referring to, in a 'wide' format.
x <- seq(-2, 2, 0.05)
ocean <- pnorm(x)
soil <- pnorm(x, 1, 1)
df <- data.frame(x, ocean, soil)
# melt the data to a long format
df2 <- melt(data = df, id.vars = "x")
# plot, using the aesthetics argument 'colour'
ggplot(data = df2, aes(x = x, y = value, colour = variable)) + geom_line()
Edit, set name and labels of legend
# Manually set name of the colour scale and labels for the different colours
ggplot(data = df2, aes(x = x, y = value, colour = variable)) +
geom_line() +
scale_colour_discrete(name = "Type of sample", labels = c("Sea water", "Soil"))
Edit2, following new sample data
Convert your data, assuming its organization from your update, to a long format. Again, I believe you make your ggplot life easier if you keep your data in a long format. I relate every step with the simple example data which I used in my first answer. Please note that there are many alternative ways to rearrange your data. This is one way, based on the small (non-reproducible) parts of your data you provided in the update.
# x <- seq(-2, 2, 0.05)
# Master1MY$Probability
Probability <- 1:100
# ocean <- pnorm(x)
# Master1MY$Stats
Master1MY <- rnorm(100, mean = 600, sd = 20)
# soil <- pnorm(x,1,1)
# Transposon1MY$Stats
Transposon1MY <- rnorm(100, mean = 100, sd = 10)
# df <- data.frame(x, ocean, soil)
df <- data.frame(Probability, Master1MY, Transposon1MY)
# df2 <- melt(df, id.var = "x")
df2 <- melt(df, id.var = "Probability")
# default
ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
geom_point()
# change legend name and labels, see previous edit using 'scale_colour_discrete'
# set manual colours scale using 'scale_colour_manual'.
ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
geom_point() +
scale_colour_manual(values = c("red","blue"), name = "Type of sample", labels = c("Master", "Transposon"))

Resources