I am having an issue producing a side-by-side bar plot of two datasets in R. I previously used the code below to create a plot which had corresponding bars from each of two datasets juxtaposed side by side, with columns from dataset 1 colored red and from dataset 2 colored blue. Now when I run the same code on any pair of datasets, including the originals which are still untouched in my saved workspace, I get separate plots for each dataset, side by side, in which individual columns alternate between red and blue between bins from the dataset. Documentation is not giving (me) any (obvious) clues as to what I've done to change the display. Please help!
## Sample data
set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))
BG.restricted.hs <- data.matrix(BG.restricted.hs, rownames.force = NA)
groups.bg.restricted.hs <- cut(x=BG.restricted.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.bg.restricted.hs <- tapply(BG.restricted.hs, groups.bg.restricted.hs, sum)
norm.bg.restricted.hs <- (rowsums.bg.restricted.hs / nrow(BG.restricted.hs))
FG.hs <- data.matrix(FG.hs, rownames.force = NA)
groups.fg.hs <- cut(x=FG.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.fg.hs <- tapply(FG.hs, groups.fg.hs, sum)
norm.fg.hs <- (rowsums.fg.hs / nrow(FG.hs))
data <- cbind(norm.fg.hs, norm.bg.restricted.hs)
barplot(height = data, xlab = "TSS Distance", ylab = "Density", col=c("red","blue"), beside = TRUE)
Data files contain only a single column of integers.
See if this is more or less what you want. It uses ggplot2, but could be adapted for barplot if you prefer:
set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))
We combine the vectors into one column (keeping track of their source in another column) so that we can simultaneously bin both of them.
dat = data.frame(x = c(BG.restricted.hs, FG.hs),
source = c(rep("BG", length(BG.restricted.hs)),
rep("FG", length(FG.hs))))
dat$bin = cut(dat$x, breaks = seq(from = min(dat$x), to = max(dat$x), by = 10000))
Plot:
library(ggplot2)
ggplot(dat, aes(x = bin, fill = source)) +
geom_bar(position = "dodge") +
theme_bw() +
scale_x_discrete(breaks = NULL)
Related
The authors of this paper (https://www.sciencedirect.com/science/article/pii/S0092867415006418) mention in their supplementary file that these were produced in Matlab. Due to lack of proficiency, time to learn it, and the license, I was trying to replicate the figure below (Figure 2 of the paper, specifically figure 2A on the left) in R:
Any suggestions? What is this plot called more generally?
Thank you!
To me it looks like a classic point plot! You can reproduce this kind of plot in R with ggplot:
# Fake dataframe with xy coordinates, type of data (for the coloring), pvalue (for size), and different panel
df <- data.frame(
x = rep(1:20, 10),
y = rnorm(200, mean = 0, sd = 2),
type = rep(rep(LETTERS[1:5], each = 4), 10),
pvalue = sample(0:50, size = 200, replace = T)/1000,
panel = sample(rep(paste0("panel", 1:4), each = 50)), 200, replace = F)
# plot
library(ggplot2)
ggplot(df, aes(x, y*x , color = type, size = pvalue)) + geom_hline(yintercept = 0) + geom_point() + facet_wrap(~panel, ncol = 2)
ggsave("demo.png")
I want to generate a figure that display all the scatter plots on this single figure using data from the two data frame (i.e., regressing column-A of Data1 against Column-A of Data2). Each plot in the figure should show R-square and p-value. I am more interested to know how I can use the fact_wrap function of ggplot while grabing data from multiple data frame.
I tried a couple of method but did not succeeded.
library(tidyverse)
Data1=data.frame(A=runif(20, min = 0, max = 100), B=runif(20, min = 0, max = 250), C=runif(20, min = 0, max = 300))
Data2=data.frame(A=runif(20, min = -10, max = 50), B=runif(20, min = -5, max = 150), C=runif(20, min = 5, max = 200))
#method-1: using plot functions
par(mfrow=c(3,1))
plot(Data1$A, Data2$A)
abline(lm(Data1$A ~ Data2$A))
plot(Data1$B, Data2$B)
abline(lm(Data1$B ~ Data2$B))
plot(Data1$C, Data2$C)
abline(lm(Data1$C ~ Data2$C))
dev.off()
#method-2: using ggplot
ggplot()+
geom_point(aes(Data1$A,Data2$A))
I want a Figure like the one below
The hardest part is tidying up your data. Once that's done, the plot is pretty straightforward.
library(tidyverse)
Data1=data.frame(A=runif(20, min = 0, max = 100), B=runif(20, min = 0, max = 250), C=runif(20, min = 0, max = 300))
Data2=data.frame(A=runif(20, min = -10, max = 50), B=runif(20, min = -5, max = 150), C=runif(20, min = 5, max = 200))
data <- Data1 %>%
#add columns to indicate the source and the observation number
mutate(source = "Data1",
obs = row_number()) %>%
#bind to Data2 with the same new columns
bind_rows(Data2 %>% mutate(source = "Data2", obs = row_number())) %>%
#tidy the data so we've got a column for Data1 and Data2 and an indicator for the series (A, B, C)
gather(A, B, C, key = series, value = value) %>%
spread(key = source, value = value)
#create a separate data frame for annotations, finding the "top left" corner of each series
annotations <- data %>%
group_by(series) %>%
summarise(x = min(Data1),
y = max(Data2)) %>%
mutate(label = c("P = 0.6", "P = 0.5", "P = 0.9"))
#plot the data, faceting by series
data %>%
ggplot(aes(Data1, Data2))+
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_grid(series~., scales = "free") +
#add the annotations with adjustments to the horiz & vert placement
geom_text(data = annotations, aes(x = x, y = y, label = label, hjust = 0, vjust = 1),
color = "red", fontface = "italic")
You can make a list of plots and then use grid.arrange() function.
sc_plots = list()
sc_plots$sc1 = ggplot() + ...
sc_plots$sc2 = ggplot() + ...
grid.arrange(sc_plots$sc1, sc_plots$sc2,
ncol = 3)
#Jordo82, here is what I get when I try to insert the text on the figures. Is there a way to free-up the Y-axis in a way that the added text do not depends on the y-scale rather it appears on the top left corner of each plot. The reason why I used annotate_custom was that it do not depends on the y-scale but the downside is that I would take only the first text in the labels. my real values are so different then each other- see the Y-scale of attached Figure.
I used your code while editing the placement coordinate
annotate("text", -1.5, 800, label = c("P = 0.6", "P = 0.5", "P = 0.9", "P = 0.9"),
color = "red", fontface = "italic")
I want to create a plot using facet_grid(), with free scales for the y axis. However, for each row, the scale breaks should be distributed evenly, that is, with 3 breaks.
I lended from this question, but I was not able to adapt the code in a way that the scale breaks are actually pretty.
However, this is my current approach:
# Packages
library(dplyr)
library(ggplot2)
library(scales)
# Test Data
set.seed(123)
result_df <- data.frame(
variable = rep(c(1,2,3,4), each = 4),
mode = rep(c(1,2), each = 2),
treat = rep(c(1,2)) %>% as.factor(),
mean = rnorm(16, mean = .7, sd = 0.2),
x = abs(rnorm(16, mean = 0, sd = 0.5))) %>%
mutate(lower = mean - x,upper = mean + x)
# Function for equal breaks, lended from
equal_breaks <- function(n = 3, s = 0.05, ...) {
function(x) {
d <- s * diff(range(x)) / (1+2*s)
round(seq(min(x)+d, max(x)-d, length=n), 2)
}}
## Plot
result_df %>%
ggplot(aes(y = mean*100, x = treat)) +
geom_pointrange(aes(ymin = lower*100, ymax = upper*100), shape = 20) +
facet_grid(variable ~ mode, scales = "free_y")+
scale_y_continuous(breaks = equal_breaks(n = 3, s = .2))+
labs(x = "", y = "")
Which leads to this current plot. As one can see, the breaks are far from being reasonable.
Thanks in advance for any kind of recommendation, and please excuse me in case I have missed a already existing solution.
Best, Malte
I am trying to generate a scatter plot where the x-axis is several categories of a continuous variable. The closest thing to it would be a Manhattan plot, where the x-axis is split by chromosome (categorical), but within each category the values are continuous.
Data:
chr <- sample(x = c(1,2), replace = T, size = 1000)
bp <- as.integer(runif(n = 1000, min = 0, max = 10000))
p <- runif(n = 1000, min = 0, max = 1)
df <- data.frame(chr,bp,p)
Starting Point:
ggplot(df, aes(y = -log10(p), x =bp)) + geom_point(colour=chr)
The red and black points should be separate categories along the x-axis.
I am not sure if I have understood your question. Probably you are looking for facets. See the example.
require(ggplot2)
chr <- sample(x = c(1,2), replace = T, size = 1000)
bp <- as.integer(runif(n = 1000, min = 0, max = 10000))
p <- runif(n = 1000, min = 0, max = 1)
df <- data.frame(chr,bp,p)
ggplot(df, aes(y = -log10(p), x = bp)) +
geom_point(aes(colour = factor(chr))) +
facet_wrap("chr")
If you really want to do this in a single plot instead of facets, you could conditionally rescale your x variable and then manually adjust the labels, e.g.:
df %>%
mutate(bp.scaled = ifelse(chr == 2, bp + 10000, bp)) %>%
ggplot(aes(y = -log10(p), x = bp.scaled)) + geom_point(colour=chr) +
scale_x_continuous(breaks = seq(0,20000,2500),
labels = c(seq(0,10000,2500), seq(2500,10000,2500)))
Result:
I am making boxplots with ggplot with data that is classified by 2 factor variables. I'd like to have the box sizes reflect sample size via varwidth = TRUE but when I do this the boxes overlap.
1) Some sample data with a 3 x 2 structure
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE),group2= sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
2) Default boxplots: ggplot without variable width
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot()
I like how the first level of grouping is shown.
Now I try to add variable widths...
3) ...and What I get when varwidth = TRUE
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T)
This overlap seems to occur whether I use color = group2 or group = group2 in both the main call to ggplot and in the geom_boxplot statement. Fussing with position_dodge doesn't seem to help either.
4) A solution I don't like visually is to make unique factors by combining my group1 and group2
data$grp.comb <- paste(data$group1, data$group2)
ggplot(data = data, aes(y = response, x = grp.comb, color = group2)) + geom_boxplot()
I prefer having things grouped to reflect the cross classification
5) The way forward:
I'd like to either a)figure out how to either make varwidth = TRUE not cause the boxes to overlap or b)manually adjusted the space between the combined groups so that boxes within the 1st level of grouping are closer together.
I think your problem can be solved best by using facet_wrap.
library(ggplot2)
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE), group2=
sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group2, color = group2)) +
geom_boxplot(varwidth = TRUE) +
facet_wrap(~group1)
Which gives:
A recent update to ggplot2 makes it so that the code provided by #N Brouwer in (3) works as expected:
# library(devtools)
# install_github("tidyverse/ggplot2")
packageVersion("ggplot2") # works with v2.2.1.9000
library(ggplot2)
set.seed(1234)
data <- data.frame(group1= sample(c("A","B","C"), 100, replace = TRUE),
group2= sample(c("D","E"), 100, replace = TRUE),
response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group1, color = group2)) +
geom_boxplot(varwidth = T)
(I'm a new user and can't post images inline)
fig 1
This question has been answered here ggplot increase distance between boxplots
The answer involves using the position = position_dodge() argument of geom_boxplot().
For your example:
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE), group2=
sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group1, color = group2)) +
geom_boxplot(position = position_dodge(1))