Heatmap with one column in R - r

I have a dataframe with scores associated with every cell and I have the result of clustering (not related to the score) in one column of my dataframe:
>head(clust.labs)
type value cell
1 1 0.3 1
2 1 0.5 2
3 1 -0.3 3
4 1 0.5 4
5 1 0.3 5
6 1 0.3 6
I want to make a heatmap with one column representing the cells, samples coming in order and colors represent the scores(value). Currently I have made a heatmap that looks like below, I want the colored parts squished to one column and. I want a rectangle to be on the left representing samples. How can I do that?
ggplot(data = clust.labs, mapping = aes(x = type,
y = cell,
fill = value)) +
geom_tile() +
xlab(label = "Sample")

I am not completely sure how the output should look, but I decided to give it a try. Since you wanted to make a single column plot, you should add a variable that has the same value for all samples, which in this case I named dummy. Then, you can do the heatmap and add the rectangle using geom_rect. Finally, you can adjust the x-axis breaks to avoid showing the -0.5 and 0.5 labels.
library(ggplot2)
library(dplyr)
df |>
mutate(dummy = 1) |>
ggplot(aes(x = factor(dummy),
y = cell,
fill = value)) +
#Add rectangle
geom_rect(aes(xmin=factor(-0.5),
xmax=factor(0.5),
ymin=0.5,
ymax=1.5),
colour = "black",
fill = "transparent") +
geom_tile() +
# Change breaks for x axis
scale_x_discrete(breaks = c(0,1)) +
xlab(label = "Sample")

This can be done using plot_ly. We have to convert the dataframe to a matrix and then run
as.matrix(as.numeric(clust.labs$value))->my.mat
colnames(my.mat)<-"KS.score"
rownames(my.mat)<-as.character(seq(1, length(my.mat[,1])))
cbind(my.mat, as.numeric(clust.labs$type))->my.mat
colnames(my.mat)<-c("KS.score", "Cluster")
plot_ly(z=my.mat, type="heatmap")

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

Control dot colour in ggplot

How can I control the colour of the dots in the scatter plot by ggplot2? I need the first 20 points to have a colour, then the next 20 to have a different colour. At the moment I am using base R plot output. The matrix looks like this
1 4
1 3
2 9
-1 8
9 9
and I have a colour vector which looks like
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
then
plot(mat[,1],mat[,2],col=cols)
works.
How could I do this ggplot?
Regarding the colours
my cols vector looks ike this
100->n
colours<-c(rep("#B8DBD3",n),rep("#FFB933",n),rep("#FF6600",n),rep("#0000FF",n),rep("#00008B",n),rep("#ADCD00",n),rep("#008B00",n),rep("#9400D3",n))
when I then do
d<-ggplot(new,aes(x=PC1,y=PC2,col=rr))
d+theme_bw() +
scale_color_identity(breaks = rep(colours, each = 1)) +
geom_point(pch=21,size=7)
the colours look completely different from
plot(new[,1],new[,2],col=colours)
this looks like
http://fs2.directupload.net/images/150417/2wwvq9u2.jpg
while ggplot with the same colours looks like
http://fs1.directupload.net/images/150417/bwc5wn7b.jpg
I would recommend creating a column that designates to which group a point belongs to.
library(ggplot2)
xy <- data.frame(x = rnorm(80), y = rnorm(80), col = as.factor(rep(1:4, each = 20)))
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
ggplot(xy, aes(x = x, y = y, col = col)) +
theme_bw() +
scale_colour_manual(values = cols) +
geom_point()

How to plot stacked proportional graph?

I have a data frame:
x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
id val0 val1 val2
1 a 1 4 7
2 b 2 5 8
3 c 3 6 9
I want to plot a stacked bar plot that shows the percentage of each columns. So, each bar represents one row and and each bar is of length but of three different colors each color representing percentage of val0, val1 and val2.
I tried looking for it, I am getting only ways to plot stacked graph but not stacked proportional graph.
Thanks.
Using ggplot2
For ggplot2 and geom_bar
Work in long format
Pre-calculate the percentages
For example
library(reshape2)
library(plyr)
# long format with column of proportions within each id
xlong <- ddply(melt(x, id.vars = 'id'), .(id), mutate, prop = value / sum(value))
ggplot(xlong, aes(x = id, y = prop, fill = variable)) + geom_bar(stat = 'identity')
# note position = 'fill' would work with the value column
ggplot(xlong, aes(x = id, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'fill', aes(fill = variable))
# will return the same plot as above
base R
A table object can be plotted as a mosaic plot. using plot. Your x is (almost) a table object
# get the numeric columns as a matrix
xt <- as.matrix(x[,2:4])
# set the rownames to be the first column of x
rownames(xt) <- x[[1]]
# set the class to be a table so plot will call plot.table
class(xt) <- 'table'
plot(xt)
you could also use mosaicplot directly
mosaicplot(x[,2:4], main = 'Proportions')

How can I highlight subset of values in ggplot2 plots?

For example I have basic stacked plot:
ggplot(diamonds, aes(x=factor(color),fill=factor(cut)))+geom_bar(position="fill")
and I have small subset diamonds with "carat" value higher than 3:
subset(diamonds,carat>3)
and I want to highlight this particular values on plot (like points or labels if our diamonds would have IDs) to see in which part of distribution are they lying. Is there any possibility to do something like that?
PS: unfortunantly I`m not allowed to post figures.
The following inserts the count of "carat greater than 3" into the bar segments. I've broken the problem down to a number of steps. Step 1: New variable identifying "carat greater than 3". Step 2: Get a summary table of the counts - of diamonds for each color and cut, and of "carat greater than 3' for each color and cut. I used the ddply() function from the plyr packages. Step 3: The bar plot without the labels. Step 4: Add to the summary table a variable giving the y positions of the labels. Step 5: Add the geom_text layer to the plot. The data frame for geom_text is the summary table. geom_text() needs aesthetics for label (in this case, the count for "carat greater than 3'), y position (calculated in the previous step), and x positions (color).
library(ggplot2)
library(plyr)
# Step 1
diamonds$caratGT3 = ifelse(diamonds$carat > 3, 1, 0)
# Step 2
diamonds2 = ddply(diamonds, .(color, cut), summarize, CountGT3 = sum(caratGT3))
diamonds2$Count = count(diamonds, .(color, cut))[,3]
diamonds2
# Step 3
p = ggplot() + geom_bar(data = diamonds, aes(x=factor(color),fill=factor(cut)))
# Step 4
diamonds2 <- ddply(diamonds2,.(color),
function(x) {
x$cfreq <- cumsum(x$Count)
x$pos <- (c(0,x$cfreq[-nrow(x)]) + x$cfreq) / 2
x
})
# Step 5
(p <- p + geom_text(data = diamonds2,
aes(x = factor(color), y = pos, label = CountGT3),
size = 3, colour = "black", face = "bold"))

Resources