How to print scale in a heatmap in R - r

While trying to create a Heatmap in R as mentioned in http://davetang.org/muse/2010/12/06/making-a-heatmap-with-r/
data <- read.table("test.txt",sep="\t",header=TRUE,row.names=1)
data_matrix <- data.matrix(data)
install.packages("RColorBrewer")
library("RColorBrewer")
heatmap(data_matrix,Colv=NA,col=brewer.pal(9,"Blues"))
How do I get the scale of colours beside the heatmap that shows range of values corresponding to shades of colours used, (small value corresponding to light shade and high value to a dark shade) similar to the first heatmap in Creating a continuous heat map in R

I am simply copying/adapting the following example code from the ggplot2 docs site:
> library(ggplot2)
> library(reshape2) # for melt
> M=melt(volcano)
> head(M)
Var1 Var2 value
1 1 1 100
2 2 1 101
3 3 1 102
4 4 1 103
5 5 1 104
6 6 1 105
> ggplot(M, aes(x=Var1, y=Var2, fill=value)) + geom_tile()
geom_tile is the important bit here. You can choose your own colours by adding something like e.g.
+ scale_fill_gradient(low="green", high="red")

Related

Formatting of grouped bar chart in ggplot

I am currently stuck on formatting a grouped bar chart.
I have a dataframe, which I would like to visualize:
iteration position value
1 1 eEP_SRO 20346
2 1 eEP_drift 22410
3 1 eEP_hole 29626
4 2 eEP_SRO 35884
5 2 eEP_drift 39424
6 2 eEP_hole 51491
7 3 eEP_SRO 51516
8 3 eEP_drift 55523
9 3 eEP_hole 74403
The position should be shown as color and the value should be represented in the height of the bar.
My code is:
fig <- ggplot(df_eEP_Location_plot, aes(fill=position, y=value, x=iteration, order=position)) +
geom_bar(stat="identity")
which gives me this result:
I would like to have a correct y-axis labelling and would also like to sort my bars from largest to smallest (ignoring the iteration number). How can I achieve this?
Thank you very much for your help!
I would recommend using fct_reorder from the forcats package to reorder your iterations along the specified values prior to plotting in ggplot. See the following with the sample data you've provided:
library(ggplot2)
library(forcats)
iteration <- factor(c(1,1,1,2,2,2,3,3,3))
position <- factor(rep(c("eEP_SRO","eEP_drift","eEP_hole")))
value <- c(20346,22410,29626,35884,39424,51491,51516,55523,74403)
df_eEP_Location_plot <- data.frame(iteration, position, value)
df_eEP_Location_plot$iteration <- fct_reorder(df_eEP_Location_plot$iteration,
-df_eEP_Location_plot$value)
fig <- ggplot(df_eEP_Location_plot, aes(y=value, x=iteration, fill=position)) +
geom_bar(stat="identity")
fig

ggplot multiple lines in same graph

I am trying to plot multiple gene expressions over time in the same graph to demonstrate a similar profile and then add a line to illustrate the mean of total for each timepoint (like the figure 4b in recent Nature comm article https://www.nature.com/articles/s41467-017-02546-5/figures/4). My data has been normalised to be around 0 so they are all on the same scale.
df2 sample:
variable value gene
1 5 -0.610384193 1
2 5 -6.25967087 2
3 5 -3.773389731 3
50 6 -0.358879035 1
51 6 -6.066341017 2
52 6 -4.202998579 3
99 7 -0.103885903 1
100 7 -6.648844687 2
101 7 -5.041554127 3
I plot the expression levels with ggplot2:
plotC <- ggplot(df2, aes(x=variable, y=value, group=factor(gene), colour=gene)) + geom_line(size=0.5, aes(color=gene), alpha=0.4)
But adding the mean line in red to this plot is proving difficult. I calculated the means and put them in another dataframe:
means
value variable gene
1 -1.5037354 5 50
2 -0.8783492 6 50
3 -0.7769085 7 50
Then tried adding them as another layer:
plotC + geom_line(data=means, aes(x=variable, y=value, color="red", group=factor(gene)), size=0.75)
But I get an error Error: Discrete value supplied to continuous scale
Do you have any suggestions as to how I can plot this mean on the same graph in another color?
Thank you,
Anna
edit: the answer by RG20 is helpful, thanks for pointing out I had the color in the wrong place. However it plots the line outside the rest of the graph... I really don't understand what's wrong with my graph...
enter image description here
plotC + geom_line(data=means, aes(x=variable, y=value, group=factor(gene)), color='red',size=0.75)

Reordering legend while modifying one particular line for a line chart in ggplot

Let's say I have a simple data frame as shown below:
> A <- data.frame(x=1:10, a=rep(1,10), d=rep(2,10), b=rep(3,10))
> A
x a d b
1 1 1 2 3
2 2 1 2 3
3 3 1 2 3
4 4 1 2 3
5 5 1 2 3
6 6 1 2 3
7 7 1 2 3
8 8 1 2 3
9 9 1 2 3
10 10 1 2 3
I want to plot this with x on the x-axis and the other columns as lines on the y-axis. I want the line representing final column to be a little thicker than the other lines. So I can do this with the following code, which leads to the plot shown below it:
library(ggplot2)
#Plot that creates a thicker line for last column of data.
#However, order of legend is changed to alphabetical order.
p <- ggplot(A, aes(x))
for(i in 2:length(A)){
gg.data <- data.frame(x=A$x, value=A[,i], name=names(A)[i])
if(i==length(A)){
p <- p + geom_line(data=gg.data, aes(y=value, color=name), size=1.1)
} else{
p <- p + geom_line(data=gg.data, aes(y=value, color=name))
}
}
Now the problem with the plot above is that the order of the variables in the legend has changed to align with alphabetical order. I don't want that; instead I want the order to remain a,d,b.
I can keep the order as I wish by using melt and then plotting using the code below, but now I don't see how to increase the size of the line representing the last column in A.
Amelt <- melt(A, id.vars='x')
#Plot that orders legend according to order of columns in data frame.
#However, not sure how to thicken one particular line over the others.
pmelt <- ggplot(Amelt)+geom_line(aes(x=x, y=value, color=variable))
How can I get both things that I want?
Have you tried using scale_fill_discrete(breaks=c("a","d","b")) to specify the legends for the plots.
Please have a look at this link:
http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/
Hope this helps!

Creating a Bar Plot with Proportions on ggplot

I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())

plot lines using qplot

I want to plot multiple lines on the sample plot using qplot in the ggplot2 package.
But I'm having some problem with it.
Using the old plot, and lines function I would do something like
m<-cbind(1:4,5:8,-(5:8))
colnames(m)<-c("time","y1","y2")
m<-as.data.frame(m)
> m
time y1 y2
1 1 5 -5
2 2 6 -6
3 3 7 -7
4 4 8 -8
plot(x=m$time,y=m$y1,type='l',ylim=range(m[,-1]))
lines(x=m$time,y=m$y2)
Thanks
Using the reshape package to melt m:
library(reshape)
library(ggplot2)
m2 <- melt(m, id = "time")
p <- ggplot(m2, aes(x = time, y = value, color = variable))
p + geom_line() + ylab("y")
You could rename the columns in the new data.frame to your liking. The trick here is to have a factor that denotes each of the lines you want to plot.

Resources