ggplot multiple lines in same graph - r

I am trying to plot multiple gene expressions over time in the same graph to demonstrate a similar profile and then add a line to illustrate the mean of total for each timepoint (like the figure 4b in recent Nature comm article https://www.nature.com/articles/s41467-017-02546-5/figures/4). My data has been normalised to be around 0 so they are all on the same scale.
df2 sample:
variable value gene
1 5 -0.610384193 1
2 5 -6.25967087 2
3 5 -3.773389731 3
50 6 -0.358879035 1
51 6 -6.066341017 2
52 6 -4.202998579 3
99 7 -0.103885903 1
100 7 -6.648844687 2
101 7 -5.041554127 3
I plot the expression levels with ggplot2:
plotC <- ggplot(df2, aes(x=variable, y=value, group=factor(gene), colour=gene)) + geom_line(size=0.5, aes(color=gene), alpha=0.4)
But adding the mean line in red to this plot is proving difficult. I calculated the means and put them in another dataframe:
means
value variable gene
1 -1.5037354 5 50
2 -0.8783492 6 50
3 -0.7769085 7 50
Then tried adding them as another layer:
plotC + geom_line(data=means, aes(x=variable, y=value, color="red", group=factor(gene)), size=0.75)
But I get an error Error: Discrete value supplied to continuous scale
Do you have any suggestions as to how I can plot this mean on the same graph in another color?
Thank you,
Anna
edit: the answer by RG20 is helpful, thanks for pointing out I had the color in the wrong place. However it plots the line outside the rest of the graph... I really don't understand what's wrong with my graph...
enter image description here

plotC + geom_line(data=means, aes(x=variable, y=value, group=factor(gene)), color='red',size=0.75)

Related

Plot frequency heatmap of positions from set of coordinates

I have a bunch of data that looks like this:
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
I want to plot a heatmap of this, so a nice looking graph labelled x and y for the position coordinates, with a gradient color fill indicating the frequency that a particular point showed up, with a scale indicator showing what the colors mean. I'm looking for a simple gradient fill with a single color low and high.
I've been at this for a while but I think the first step should be to construct another data-set with the positions and a new column showing the frequencies? But I'm not 100% sure how to structure this.
So far my attempts look similar to:
ggplot(data=all_data, aes(x=X, y=Y)) + geom_tile(aes(fill=all_data$X)) +
scale_fill_gradient2(low="green", high="blue") + coord_equal()
As Jon Spring suggested, the following code shows up a graph like this:
all_data <- read.table(text = "
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
", header = T, row.names = NULL)
ggplot(data=all_data, aes(x=X, y=Y)) + geom_bin2d()

Reordering legend while modifying one particular line for a line chart in ggplot

Let's say I have a simple data frame as shown below:
> A <- data.frame(x=1:10, a=rep(1,10), d=rep(2,10), b=rep(3,10))
> A
x a d b
1 1 1 2 3
2 2 1 2 3
3 3 1 2 3
4 4 1 2 3
5 5 1 2 3
6 6 1 2 3
7 7 1 2 3
8 8 1 2 3
9 9 1 2 3
10 10 1 2 3
I want to plot this with x on the x-axis and the other columns as lines on the y-axis. I want the line representing final column to be a little thicker than the other lines. So I can do this with the following code, which leads to the plot shown below it:
library(ggplot2)
#Plot that creates a thicker line for last column of data.
#However, order of legend is changed to alphabetical order.
p <- ggplot(A, aes(x))
for(i in 2:length(A)){
gg.data <- data.frame(x=A$x, value=A[,i], name=names(A)[i])
if(i==length(A)){
p <- p + geom_line(data=gg.data, aes(y=value, color=name), size=1.1)
} else{
p <- p + geom_line(data=gg.data, aes(y=value, color=name))
}
}
Now the problem with the plot above is that the order of the variables in the legend has changed to align with alphabetical order. I don't want that; instead I want the order to remain a,d,b.
I can keep the order as I wish by using melt and then plotting using the code below, but now I don't see how to increase the size of the line representing the last column in A.
Amelt <- melt(A, id.vars='x')
#Plot that orders legend according to order of columns in data frame.
#However, not sure how to thicken one particular line over the others.
pmelt <- ggplot(Amelt)+geom_line(aes(x=x, y=value, color=variable))
How can I get both things that I want?
Have you tried using scale_fill_discrete(breaks=c("a","d","b")) to specify the legends for the plots.
Please have a look at this link:
http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/
Hope this helps!

Creating a Bar Plot with Proportions on ggplot

I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())

How to print scale in a heatmap in R

While trying to create a Heatmap in R as mentioned in http://davetang.org/muse/2010/12/06/making-a-heatmap-with-r/
data <- read.table("test.txt",sep="\t",header=TRUE,row.names=1)
data_matrix <- data.matrix(data)
install.packages("RColorBrewer")
library("RColorBrewer")
heatmap(data_matrix,Colv=NA,col=brewer.pal(9,"Blues"))
How do I get the scale of colours beside the heatmap that shows range of values corresponding to shades of colours used, (small value corresponding to light shade and high value to a dark shade) similar to the first heatmap in Creating a continuous heat map in R
I am simply copying/adapting the following example code from the ggplot2 docs site:
> library(ggplot2)
> library(reshape2) # for melt
> M=melt(volcano)
> head(M)
Var1 Var2 value
1 1 1 100
2 2 1 101
3 3 1 102
4 4 1 103
5 5 1 104
6 6 1 105
> ggplot(M, aes(x=Var1, y=Var2, fill=value)) + geom_tile()
geom_tile is the important bit here. You can choose your own colours by adding something like e.g.
+ scale_fill_gradient(low="green", high="red")

Multiple Plots in R

I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.

Resources