R ggplot boxplot: order filling variable - r

I am generating a number of boxplots, each for a different marker, filled by a categorical variable: 'CR' and 'No CR'.
I want the left box in the plot to be the 'No CR'-fill and the right one 'CR'. Like this one:
However, for some plots, I get it the other way around (left 'CR' and right 'No CR')
How can I control (order?) which filling category is left and which one is right?
Here is part of my code:
head(df)
# ID y CR
# 1 1 0 No CR
# 2 2 0 No CR
# 3 3 0 CR
# 4 4 4 No CR
ggplot(df, aes(x = CR, y = y)) +
geom_boxplot(aes(fill=CR)) +
labs(title="Highly differential peptides") +
scale_fill_manual(values=c("#35978f","#D6604D"))+
stat_compare_means( label.y = maxadn,size=5)

You can relevel your CR variable :
df$CR=factor(df$CR,levels=c("No CR","CR"))
and then try to replot

It's nice to include a minimal, reproducible example to make it easier to answer your question thoroughly. First I'll simulate some data:
library("ggplot2")
df <- data.frame(
CR = sample(c("CR", "No CR"), 20, replace=TRUE),
y = rpois(20, 2)
)
Then, as explained in this question, you can either set the limits directly:
ggplot(df, aes(x = CR, y = y)) +
geom_boxplot(aes(fill=CR)) +
scale_fill_manual(values=c("#35978f","#D6604D")) +
scale_x_discrete(limits=c("No CR", "CR"))
or use factor levels to control the order:
ggplot(df, aes(x = factor(CR, levels=c("No CR", "CR")), y = y)) +
geom_boxplot(aes(fill=CR)) +
scale_fill_manual(values=c("#35978f","#D6604D")) +
labs(x = "CR")
Without any reordering:
ggplot(df, aes(x = CR, y = y)) +
geom_boxplot(aes(fill=CR)) +
scale_fill_manual(values=c("#35978f","#D6604D"))

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

How to create two barplots with different x and y axis in tha same plot in R?

I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.

Add pair lines in R

I have some data measured pair-wise (e.g. 1C, 1M, 2C and 2M), which I have plotted separately (as C and M). However, I would like to add a line between each pair (e.g. a line from point 1 in the C column to point 1 in the M 'column').
A small section of the entire dataset:
PairNumber Type M
1 M 0.117133
2 M 0.054298837
3 M 0.039734
4 M 0.069247069
5 M 0.043053957
1 C 0.051086898
2 C 0.075519
3 C 0.065834198
4 C 0.084632915
5 C 0.054254946
I have generated the below picture using the following tiny R snippet:
boxplot(test$M ~ test$Type)
stripchart(test$M ~ test$Type, vertical = TRUE, method="jitter", add = TRUE, col = 'blue')
Current plot:
I would like to know what command or what function I would need to achieve this (a rough sketch of the desired result, with only some of the lines, is presented below).
Desired plot:
Alternatively, doing this with ggplot is also fine by me, I have the following alternative ggplot code to produce a plot similar to the first one above:
ggplot(,aes(x=test$Type, y=test$M)) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(position=position_jitter(width=.1, height=0))
I have been trying geom_path, but I have not found the correct syntax to achieve what I want.
I would probably recommend breaking this up into multiple visualizations -- with more data, I feel this type of plot would become difficult to interpret. In addition, I am not sure it's possible to draw the geom_lines and connect them with the additional call to geom_jitter. That being said, this gets you most of the way there:
ggplot(df, aes(x = Type, y = M)) +
geom_boxplot(outlier.shape = NA) +
geom_line(aes(group = PairNumber)) +
geom_point()
The trick is to specify your group aesthetic within geom_line() and not up top within ggplot().
Additional Note: No reason to fully qualify your aesthetic variables within ggplot() -- that is, no reason to do ggplot(data = test, aes(x = test$Type, y = test$M); rather, just use: ggplot(data = test, aes(x = Type, y = M)).
UPDATE
Leveraging cowplot to visualize this data in different plots could prove helpful:
library(cowplot)
p1 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_boxplot()
p2 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_jitter(position = position_jitter(width = 0.1, height = 0))
p3 <- ggplot(df, aes(x = M, color = Type, fill = Type)) +
geom_density(alpha = 0.5)
p4 <- ggplot(df, aes(x = Type, y = M)) +
geom_line(aes(group = PairNumber, color = factor(PairNumber)))
plot_grid(p1, p2, p3, p4, labels = c(LETTERS[1:4]), align = "v")

In ggplot2, geom_text() labels are misplaced below my data points (as pictured). How to overlay them onto points?

I'm using ggplot2 to create a simple dot plot of -1 to +1 correlation values using the following R code:
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y= row.names(dataframe))) +
geom_text(aes(y=exit, label=samplesize))
The y-axis has text labels, and I believe those text labels may be the reason that my geom_text() data point labels are squished down into the bottom of the plot as pictured here:
How can I change my plotting so that the data point labels appear on the dots themselves?
I understand that you would like to have the samplesize appear above each data point in the plot. Here is a sample plot with a sample data frame that does this:
EDIT: Per note by Gregor, changed the geom_text() call to utilize aes() when referencing the data. Thanks for the heads up!
top10_rank<-
String Number
4 h 0
1 a 1
11 w 1
3 z 3
7 z 3
2 b 4
8 q 5
6 k 6
9 r 9
5 x 10
10 l 11
x<-ggplot(data=top10_rank, aes(x = Number,
y = String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
x + geom_text(data=top10_rank, size=5, color = 'blue',
aes(x = Number,label = Number), hjust=0, vjust=0)
Not sure if this is what you wanted though.
Your problem is simply that you switched the y variables:
# your code
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y = row.names(dataframe))) + # here y is the row names
geom_text(aes(y =exit, label = samplesize)) # here y is the exit column
Since you want the same y-values for both you can define this in the initial ggplot() call and not worry about repeating it later
# working version
ggplot(dataframe, aes(x = exit, y = row.names(dataframe))) +
geom_point() +
geom_text(aes(label = samplesize))
Using row names is a little fragile, it's a little safer and more robust to actually create a data column with what you want for y values:
# nicer code
dataframe$y = row.names(dataframe)
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(label = samplesize))
Having done this, you probably don't want the labels right on top of the points, maybe a little offset would be better:
# best of all?
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(x = exit + .05, label = samplesize), vjust = 0)
In the last case, you'll have to play with the adjustment to the x aesthetic, what looks right will depend on the dimensions of your final plot

ggplot2 line plot order

I have a series of ordered points as shown below:
However when I try to connect the points by a line, I get the following output:
The plot is connecting 26 to 1 and 25 to 9 and 10 (some of the errors), instead of following the order. The code for plotting the points is given below:
p<-ggplot(aes(x = x, y = y), data = spat_loc)
p<-p + labs(x = "x Coords (Km)", y="Y coords (Km)") +ggtitle("Locations")
p<-p + geom_point(aes(color="Red",size=2)) + geom_text(aes(label = X))
p + theme_bw()
Plotting code:
p +
geom_line((aes(x=x, y=y)),colour="blue") +
theme_bw()
The file which contains the locations have the following structure:
X x y
1 210 200
.
.
.
where X is the numeric ID and x and y are the pair of co-ordinates.
What do I need to do to make the line follow the ordering of points?
geom_path() will join points in the original order, so you can order your data in the way you want it joined, and then just do + geom_path(). Here's some dummy data:
dat <- data.frame(x = sample(1:10), y = sample(1:10), order = sample(1:10))
ggplot(dat[order(dat$order),], aes(x, y)) + geom_point() + geom_text(aes(y = y + 0.25,label = order)) +
geom_path()

Resources