I am trying to generate a scatter plot where the x-axis is several categories of a continuous variable. The closest thing to it would be a Manhattan plot, where the x-axis is split by chromosome (categorical), but within each category the values are continuous.
Data:
chr <- sample(x = c(1,2), replace = T, size = 1000)
bp <- as.integer(runif(n = 1000, min = 0, max = 10000))
p <- runif(n = 1000, min = 0, max = 1)
df <- data.frame(chr,bp,p)
Starting Point:
ggplot(df, aes(y = -log10(p), x =bp)) + geom_point(colour=chr)
The red and black points should be separate categories along the x-axis.
I am not sure if I have understood your question. Probably you are looking for facets. See the example.
require(ggplot2)
chr <- sample(x = c(1,2), replace = T, size = 1000)
bp <- as.integer(runif(n = 1000, min = 0, max = 10000))
p <- runif(n = 1000, min = 0, max = 1)
df <- data.frame(chr,bp,p)
ggplot(df, aes(y = -log10(p), x = bp)) +
geom_point(aes(colour = factor(chr))) +
facet_wrap("chr")
If you really want to do this in a single plot instead of facets, you could conditionally rescale your x variable and then manually adjust the labels, e.g.:
df %>%
mutate(bp.scaled = ifelse(chr == 2, bp + 10000, bp)) %>%
ggplot(aes(y = -log10(p), x = bp.scaled)) + geom_point(colour=chr) +
scale_x_continuous(breaks = seq(0,20000,2500),
labels = c(seq(0,10000,2500), seq(2500,10000,2500)))
Result:
Related
I'm trying to make a scaled histogram in a such a way, that transparency of each "column" (bin?) depends on the number of observations in a given range of x. Here is my code:
set.seed(1)
test = data.frame(x = rnorm(200, mean = 0, sd = 10),
y = as.factor(sample(c(0,1), replace=TRUE, size=100)))
threshold = 20
ggplot(test,
aes(x = x))+
geom_histogram(aes(fill = y, alpha = stat(count) > threshold),
position = "fill", bins = 10)
Basically I want to make plots that will looks like this:
however my code generate the plots there transparency are applied based on the count after grouping that ends up with hanging column like this:
For this example, in order to simulate a "proper" plot I just adjust the threshold, but I need alpha to consider sum of count from both groups in a given "column"(bin).
UPDATE:
I also want it to work with faceted plots in a such a way that highlighted area in each facet was independent from other facets. Approach that proposed #Stefan works perfect for the individual plot, but in faceted plot highlights the same area at all facets.
library(ggplot2)
set.seed(1)
test = data.frame(x = rnorm(1000, mean = 0, sd = 10),
y = as.factor(sample(c(0,1), replace=TRUE, size=1000)),
n = as.factor(sample(c(0,1,2), replace=TRUE, size=1000)),
m = as.factor(sample(c(0,1,3,4), replace=TRUE, size=1000)))
f = function(..count.., ..x..) tapply(..count.., factor(..x..), sum)[factor(..x..)]
threshold = 10
ggplot(test,
aes(x = x))+
geom_histogram(aes(fill = y, alpha = f(..count.., ..x..) > threshold),
position = "fill", bins = 10)+
facet_grid(rows = vars(n),
cols = vars(m))
This could be achieved like so:
As the count computed by stat_count is the number of obs after grouping we have to manually aggregate the count over groups to get the total count per bin.
To aggregate the counts per bin I use tapply, where I make use of the .. notation to get the variables computed by stat_count.
As the grouping variable I make use of the computed variable ..x.. which to the best of my knowledge is not documented. Basically ..x.. contains by default the midpoints of the bins and as such can be used as an identifier for the bins. However, as these are continuous values we have convert them to a factor.
Finally, to make the code more readable I use a auxilliary function to compute the aggregate counts. Additionally I double the threshold value to 20.
library(ggplot2)
set.seed(1)
test <- data.frame(
x = rnorm(200, mean = 0, sd = 10),
y = as.factor(sample(c(0, 1), replace = TRUE, size = 100))
)
threshold <- 20
f <- function(..count.., ..x..) tapply(..count.., factor(..x..), sum)[factor(..x..)]
p <- ggplot(
test,
aes(x = x)
) +
geom_histogram(aes(fill = y, alpha = f(..count.., ..x..) > threshold),
position = "fill", bins = 10
)
p
EDIT To allow for facetting we have to pass the function the ..PANEL.. identifier as an addtional argument. Instead of using tapply I now use dplyr::group_by and dplyr::add_count to compute the total count per bin and facet panel:
library(ggplot2)
library(dplyr)
set.seed(1)
test <- data.frame(
x = rnorm(200, mean = 0, sd = 10),
y = as.factor(sample(c(0, 1), replace = TRUE, size = 100)),
type = rep(c("A", "B"), each = 100)
)
threshold <- 20
f <- function(count, x, PANEL) {
data.frame(count, x, PANEL) %>%
add_count(x, PANEL, wt = count) %>%
pull(n)
}
p <- ggplot(
test,
aes(x = x)
) +
geom_histogram(aes(fill = y, alpha = f(..count.., ..x.., ..PANEL..) > threshold),
position = "fill", bins = 10
) +
facet_wrap(~type)
p
#> Warning: Using alpha for a discrete variable is not advised.
#> Warning: Removed 2 rows containing missing values (geom_bar).
I want to create a plot using facet_grid(), with free scales for the y axis. However, for each row, the scale breaks should be distributed evenly, that is, with 3 breaks.
I lended from this question, but I was not able to adapt the code in a way that the scale breaks are actually pretty.
However, this is my current approach:
# Packages
library(dplyr)
library(ggplot2)
library(scales)
# Test Data
set.seed(123)
result_df <- data.frame(
variable = rep(c(1,2,3,4), each = 4),
mode = rep(c(1,2), each = 2),
treat = rep(c(1,2)) %>% as.factor(),
mean = rnorm(16, mean = .7, sd = 0.2),
x = abs(rnorm(16, mean = 0, sd = 0.5))) %>%
mutate(lower = mean - x,upper = mean + x)
# Function for equal breaks, lended from
equal_breaks <- function(n = 3, s = 0.05, ...) {
function(x) {
d <- s * diff(range(x)) / (1+2*s)
round(seq(min(x)+d, max(x)-d, length=n), 2)
}}
## Plot
result_df %>%
ggplot(aes(y = mean*100, x = treat)) +
geom_pointrange(aes(ymin = lower*100, ymax = upper*100), shape = 20) +
facet_grid(variable ~ mode, scales = "free_y")+
scale_y_continuous(breaks = equal_breaks(n = 3, s = .2))+
labs(x = "", y = "")
Which leads to this current plot. As one can see, the breaks are far from being reasonable.
Thanks in advance for any kind of recommendation, and please excuse me in case I have missed a already existing solution.
Best, Malte
I have a plot like this:
Which was created with this code:
# Make data:
set.seed(42)
n <- 1000
df <- data.frame(values = sample(0:5, size = n, replace = T, prob = c(9/10, rep(0.0167,5))),
group = rep(1:100, each = 10),
fill2 = rep(rnorm(10), each = 100),
year = rep(2001:2010, times = 100)
)
df$values <- ifelse(df$year %in% 2001:2007 == T, 0, df$values)
# Plot
require(ggplot2)
p <- ggplot(data = df, aes(x = year, y = values, colour = as.factor(group))) + geom_line()
p
Since there are so many groups, the legend is really not helpfull.
Ideally I would like just two elements in the legend, one for group = 1 and for all the other groups (they should all have the same color). Is there a way to force this?
you can define a new variable that has only two values, but still plot lines according to their original group,
ggplot(data = df, aes(x = year, y = values, group = group,
colour = ifelse(group == 1, "1", "!1"))) +
geom_line() +
scale_colour_brewer("groups", palette="Set1")
I am having an issue producing a side-by-side bar plot of two datasets in R. I previously used the code below to create a plot which had corresponding bars from each of two datasets juxtaposed side by side, with columns from dataset 1 colored red and from dataset 2 colored blue. Now when I run the same code on any pair of datasets, including the originals which are still untouched in my saved workspace, I get separate plots for each dataset, side by side, in which individual columns alternate between red and blue between bins from the dataset. Documentation is not giving (me) any (obvious) clues as to what I've done to change the display. Please help!
## Sample data
set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))
BG.restricted.hs <- data.matrix(BG.restricted.hs, rownames.force = NA)
groups.bg.restricted.hs <- cut(x=BG.restricted.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.bg.restricted.hs <- tapply(BG.restricted.hs, groups.bg.restricted.hs, sum)
norm.bg.restricted.hs <- (rowsums.bg.restricted.hs / nrow(BG.restricted.hs))
FG.hs <- data.matrix(FG.hs, rownames.force = NA)
groups.fg.hs <- cut(x=FG.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.fg.hs <- tapply(FG.hs, groups.fg.hs, sum)
norm.fg.hs <- (rowsums.fg.hs / nrow(FG.hs))
data <- cbind(norm.fg.hs, norm.bg.restricted.hs)
barplot(height = data, xlab = "TSS Distance", ylab = "Density", col=c("red","blue"), beside = TRUE)
Data files contain only a single column of integers.
See if this is more or less what you want. It uses ggplot2, but could be adapted for barplot if you prefer:
set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))
We combine the vectors into one column (keeping track of their source in another column) so that we can simultaneously bin both of them.
dat = data.frame(x = c(BG.restricted.hs, FG.hs),
source = c(rep("BG", length(BG.restricted.hs)),
rep("FG", length(FG.hs))))
dat$bin = cut(dat$x, breaks = seq(from = min(dat$x), to = max(dat$x), by = 10000))
Plot:
library(ggplot2)
ggplot(dat, aes(x = bin, fill = source)) +
geom_bar(position = "dodge") +
theme_bw() +
scale_x_discrete(breaks = NULL)
I am currently making a histogram with three different variables x,y and z using the following code:
require(ggplot2)
require(reshape2)
set.seed(1)
df <- data.frame(x = rnorm(n = 1000, mean = 2, sd = 0.2),
y = rnorm(n = 1000, mean = 2),
z = rnorm(n = 1000, mean = 2))
ggplot(melt(df), aes(value, fill = variable)) + geom_histogram(position = "dodge")
The code works fine, but I want to change the colors of the three different histograms and I'm not really sure how to do this in this specific case. Maybe to something like red, black and green for instance.
Thanks