Binary Data heatmap - r

Can anyone tell me how to plot a heatmap for binary data, similar to heatmap in this link- Binary R heatmap still displays gradient ,
I tried to do, but I suppose I am not able to give the file input properly. Here is the link to the data I want to plot-
https://www.dropbox.com/s/7k1uskwrfuaugw3/Dataset.csv?dl=0
Here is a subset of my data-
Strains gene1 gene2 gene3 gene4 gene5
strain1 1 1 1 1 1
strain2 1 1 1 1 1
strain3 1 1 1 1 1
strain4 1 1 1 1 1
strain5 1 1 1 1 1
strain6 1 1 1 1 1
strain7 1 1 1 1 1
strain8 1 1 1 1 1
strain9 1 1 0 0 0
And the output I am getting:
library(gplots)
file1<- read.csv('Dataset.csv',header = T)
class(file1)
dat <- data.frame(file1)
dim(dat)
names(dat)
head(dat)
rownames(dat) <-dat$Strains
head(dat)
dim(dat)
head(dat)
dat.tdy <- dat[,2:26]
dat.n <- scale(t(dat.tdy))
dat.tn <- t(dat.n)
col = c("black", "grey")
row_names <- rownames(dat.tn)
heatmap.2(dat.tn, scale = "none", Rowv = NA, Colv = NA, col = c("black", "grey"), margin=c(6, 4),trace='none',labRow = row_names,
lhei=c(1,4),cexRow = 1,cexCol = 1,
lwid=c(.1,1), keysize=0.1, key.par = list(cex=0.5), sepwidth=c(0.1,0.1),
sepcolor="white",
colsep=1:ncol(dat),
rowsep=1:nrow(dat))
This code is running properly and giving an output, but when I cross check with the input file, I see that the color matrix in heatmap is different than the input file. For example in the heatmap, for gene7 there is only one black box, but it actually has almost 13 zeros in the input file.
I feel there is a simpler way to do it..but as I am new to R, I am not able to figure it out. I am doing something wrong in giving the input file. Please help.
Thanks

Pleas try this code you have made some error in the dimension
library(gplots)
file1<- read.csv('Dataset.csv',header = T,row.names = 1)
class(file1)
dat <- data.frame(file1)
dim(dat)
names(dat)
head(dat)
rownames(dat) <-dat$Strains
head(dat)
dim(dat)
head(dat)
dat.tdy <- dat[,1:25]
dat.n <- scale(t(dat.tdy))
dat.tn <- t(dat.n)
col = c("black", "grey")
row_names <- rownames(dat.tn)
heatmap.2(dat.tn, scale = "none", Rowv = NA, Colv = NA, col = c("black", "grey"), margin=c(6, 4),trace='none',labRow = FALSE,
lhei=c(1,4),cexRow = 1,cexCol = 1,
lwid=c(.1,1), keysize=0.1, key.par = list(cex=0.5), sepwidth=c(0.1,0.1),
sepcolor="white",
colsep=1:ncol(dat),
rowsep=1:nrow(dat))

I found a much simple code to plot the binary heatmap-
library(d3heatmap)
x<- read.csv("Dataset.csv", header = T, row.names = 1)
d3heatmap(x, Colv = NA,Rowv = NA, col = c("blue", "red"), scale="none", cexRow = 0.6,cexCol = 1)
Example dataset used-
RC C1 C2 C3 C4
R1 1 1 0 1
R2 0 1 1 0
R3 0 1 1 1
R4 1 1 1 0
R5 1 1 1 1
R6 0 0 0 1
R7 1 1 1 1
R8 1 1 1 1
R9 0 1 1 1
R10 1 1 0 0

Related

Get variable combination matrix

Data
We have numerous text strings that look like this (way longer in our real dataset):
df <- data.frame(
id = c('text1','text2','text3'),text = c('ABA','ABA','AAA')
)
>df
id text
1 text1 ABA
2 text2 ABA
3 text3 AAA
We want to create a matrix that tells how often a letter at position x is found together with the other letters at other positions, so in this case:
3A 3 1 2 3
2B 2 0 2 2
2A 1 1 0 1
1A 3 1 2 3
1A 2A 2B 3A
What I tried
I previously converted the matrix to a binary matrix, looking like this:
structure(list(pos1_A = c(1, 1, 1), pos2_A = c(0, 0, 1), pos2_B = c(1,
1, 0), pos3_A = c(1, 1, 1)), class = "data.frame", row.names = c("text1",
"text2", "text3"))
pos1_A pos2_A pos2_B pos3_A
text1 1 0 1 1
text2 1 0 1 1
text3 1 1 0 1
Then I can run commands like cor to get correlations, however, instead of correlations I want the frequencies.
Note this is different from questions about co-occurrences wherein the variable name itself (here position) is neglected, for example like "How to use R to create a word co-occurrence matrix"
Huge credit to #Ronak Shah with the answer here
It's much simpler if we convert the categorical data to a numerical (binary matrix), for example using this hacky but easy way with the homals package and then apply the method by #Ronak Shah linked above:
# The dataset
df <- data.frame(
id = c('text1','text2','text3'),text = c('ABA','ABA','AAA')
)
# Split the strings in characters and add column names
df2 <- df %>% splitstackshape::cSplit('text', sep = '', stripWhite = FALSE, type.convert = FALSE, direction = 'wide') %>%
column_to_rownames('id')
colnames(df2) <- paste0('pos', 1:ncol(df2))
# Convert to binary matrix (hacky way)
bin.mat <- homals:::expandFrame(df2, clean = F)
# Method by #Ronak Shah to get the frequency matrix
fun <- function(x, y) sum(bin.mat[, x] & bin.mat[, y])
n <- seq_along(bin.mat)
mat <- outer(n, n, Vectorize(fun))
dimnames(mat) <- list(names(bin.mat)[n], names(bin.mat[n]))
This produces the matrix:
>mat
pos1_A pos2_A pos2_B pos3_A
pos1_A 3 1 2 3
pos2_A 1 1 0 1
pos2_B 2 0 2 2
pos3_A 3 1 2 3
Here's an alternative approach that produces a matrix as originally requested:
# Make all strings the same length:
df$text <- stringr::str_pad(df$text, side = "right", max(nchar(df$text)))
# Create a matrix with all letters labelled by their position:
all_vals <- apply(do.call(rbind, strsplit(df$text, "")), 1,
function(x) paste0(seq_along(x), x))
# Create a vector of all possible letter / position combos
all_labs <- do.call(paste0, expand.grid(seq(max(nchar(df$text))),
unique(unlist(strsplit(df$text, "")))))
# Create a function that will count all co-occurences per data frame row
f <- function(y, x) as.vector(outer(x, x, function(a, b) 1 * (a %in% y & b %in% y)))
# Create the results matrix and label it
m <- matrix(rowSums(apply(as.data.frame(all_vals), 2, f, all_labs)), nrow = length(all_labs))
rownames(m) <- all_labs
colnames(m) <- all_labs
m
#> 1A 2A 3A 1B 2B 3B
#> 1A 3 1 3 0 2 0
#> 2A 1 1 1 0 0 0
#> 3A 3 1 3 0 2 0
#> 1B 0 0 0 0 0 0
#> 2B 2 0 2 0 2 0
#> 3B 0 0 0 0 0 0
Created on 2020-05-24 by the reprex package (v0.3.0)

Using sapply inside sapply

I need to execute sapply inside another sapply.
This is the working code I have.
animal <- c("Dog", "Cat", "Bird", "Fish", "Monkey", "Lion", "Dolphin", "Panda")
a <- as.data.frame(sapply(1:7, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL))))
colnames(a) <- (animal)
I would like to build this data frame 10 time without doing this.
animal <- c("Dog", "Cat", "Bird", "Fish", "Monkey", "Lion", "Dolphin", "Panda")
a <- as.data.frame(sapply(1:7, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL))))
colnames(a) <- (animal)
b <- as.data.frame(sapply(1:7, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL))))
colnames(b) <- (animal)
...
j <- as.data.frame(sapply(1:7, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL))))
colnames(j) <- (animal)
I have tried this without success
sapply(letters[1:10], function(z) as.data.frame(sapply(1:7, function(y) rbinom(300, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL)))), colnames(letters[1:10]) <- (animal))
Thanks
If you need to do this with two apply type functions, you can do something like this:
Also you have Eight animals in animal and only making 7 columns. So I have shortened animal.
Using lapply on the outer loop will always return a list, which makes it a bit neater than sapply from what I understand you are trying to do.
animal <- c("Dog", "Cat", "Bird", "Fish", "Monkey", "Lion", "Dolphin")
lapply(1:10, function(x){
a <- as.data.frame(
sapply(1:7, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL)))
)
names(a) <- (animal)
a
})
You are using sapply, so it is not clear if you want the end result to be a matrix or a list. If you want a matrix as output, then a straightforward approach would be to use your existing code, but start with an expanded vector (animals x replicates).
animal.reps = sapply(c("Dog", "Cat", "Bird", "Fish", "Monkey", "Lion", "Dolphin", "Panda"), paste, letters[1:10], sep = ".")
a = sapply(animal.reps, function(y) rbinom(30, 1, sample(seq(.4, .9, by=.1), 1, prob = NULL)) )
This gives a 30x80 matrix:
> dim(a)
[1] 30 80
> a[1:10, 1:10]
Dog.a Dog.b Dog.c Dog.d Dog.e Dog.f Dog.g Dog.h Dog.i Dog.j
[1,] 1 1 1 1 1 0 1 1 1 0
[2,] 1 1 0 0 1 0 0 1 1 0
[3,] 1 0 1 1 1 0 1 1 1 0
[4,] 1 1 0 1 1 1 1 1 1 0
[5,] 1 1 1 0 1 1 0 1 1 0
[6,] 0 1 0 1 1 0 0 1 1 1
[7,] 1 1 0 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 0 1 1 1 1
[9,] 1 1 0 1 1 0 1 1 1 0
[10,] 0 1 1 1 1 1 1 1 1 1

How to plotting binary matrix only 1(one) elements in R

I have a sparse matrix .csv file and save the Matrix like:
v1 v2 v3 v4 v5 v6 ... vn
1 0 1 0 1 0 0
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 1 0 0 0 0 1
5 1 0 1 0 1 0
...
m
I want make plot's x value = v1~vn , y value = 1~m
and marking only non-zero elements(only 1)
in Matlab I use spy(), but I don't know how do this in R.
You can use the SparseM package:
m <- matrix(as.numeric(runif(100) > 0.9), ncol = 10) # create random sparse matrix
library(SparseM)
image(as.matrix.csr(m)) # plot it :)
Here is a solution using ggplot2::ggplot.
# Sample data
set.seed(2017);
df <- matrix(sample(c(0, 1), 100, replace = TRUE), nrow = 10);
df;
# Convert wide to long
library(reshape2);
df.long <- melt(df);
# Var1 = row
# Var2 = column
library(ggplot2);
gg <- ggplot(subset(df.long, value == 1), aes(x = Var2, y = Var1));
gg <- gg + geom_point(size = 2, fill = "blue", shape = 21);
gg <- gg + theme_bw();
gg <- gg + labs(y = "Row", x = "Column");
gg <- gg + scale_y_reverse();

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

how to create 3 by 3 Contingency table with two variables in R

Example:
x <- c( 1, NA, 0, 1)
y <- c(NA, NA, 0, 1)
table(x,y, useNA="always") # --->
# y
# x 0 1 <NA>
# 0 1 0 0
# 1 0 1 1
# <NA> 0 0 1
My question is:
a <- c(NA, NA, NA, NA)
b <- c(1, 1, 1, 1)
table(a, b, useNA="always") ## --> It is 1X2 matrix.
# b
# a 1 <NA>
# <NA> 4 0
I want to get a 3X3 table with the same colnames, rownames and dimensions as the example above.. Then I will apply chisq.test for the table.
Thank you very much for your answers!
You can achieve this by converting both a and b into factors with the same levels. This works because factor vectors keep track of all possible values (aka levels) that their elements might take, even when they in fact contain just a subset of those.
a <- c(NA, NA, NA, NA)
b <- c(1, 1, 1, 1)
levs <- c(0, 1)
table(a = factor(a, levels = levs),
b = factor(b, levels = levs),
useNA = "always")
# b
# a 0 1 <NA>
# 0 0 0 0
# 1 0 0 0
# <NA> 0 4 0

Resources