How to plotting binary matrix only 1(one) elements in R - r

I have a sparse matrix .csv file and save the Matrix like:
v1 v2 v3 v4 v5 v6 ... vn
1 0 1 0 1 0 0
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 1 0 0 0 0 1
5 1 0 1 0 1 0
...
m
I want make plot's x value = v1~vn , y value = 1~m
and marking only non-zero elements(only 1)
in Matlab I use spy(), but I don't know how do this in R.

You can use the SparseM package:
m <- matrix(as.numeric(runif(100) > 0.9), ncol = 10) # create random sparse matrix
library(SparseM)
image(as.matrix.csr(m)) # plot it :)

Here is a solution using ggplot2::ggplot.
# Sample data
set.seed(2017);
df <- matrix(sample(c(0, 1), 100, replace = TRUE), nrow = 10);
df;
# Convert wide to long
library(reshape2);
df.long <- melt(df);
# Var1 = row
# Var2 = column
library(ggplot2);
gg <- ggplot(subset(df.long, value == 1), aes(x = Var2, y = Var1));
gg <- gg + geom_point(size = 2, fill = "blue", shape = 21);
gg <- gg + theme_bw();
gg <- gg + labs(y = "Row", x = "Column");
gg <- gg + scale_y_reverse();

Related

Replace column values based on column name

I have a data frame with several binary variables: x1, x2, ... x100. I want to replace the entry 1 in each column with the number in the name of the column, i.e.:
data$x2[data$x2 == 1] <- 2
data$x3[data$x3 == 1] <- 3
data$x4[data$x4 == 1] <- 4
data$x5[data$x5 == 1] <- 5
...
How can I achieve this in a loop?
Using col:
# example data
set.seed(1); d <- as.data.frame(matrix(sample(0:1, 12, replace = TRUE), nrow = 3))
names(d) <- paste0("x", seq(ncol(d)))
d
# x1 x2 x3 x4
# 1 0 0 0 1
# 2 1 1 0 0
# 3 0 0 1 0
ix <- d == 1
d[ ix ] <- col(d)[ ix ]
d
# x1 x2 x3 x4
# 1 0 0 0 4
# 2 1 2 0 0
# 3 0 0 3 0
dplyr approach (using zx8754's data):
library(dplyr)
d %>%
mutate(across(starts_with('x'), ~ . * as.numeric(gsub('x', '', cur_column()))))
#> x1 x2 x3 x4
#> 1 0 0 0 4
#> 2 1 2 0 0
#> 3 0 0 3 0
Created on 2021-05-26 by the reprex package (v2.0.0)
Here is a base R solution with a lapply loop.
data[-1] <- lapply(names(data)[-1], function(k){
n <- as.integer(sub("[^[:digit:]]*", "", k))
data[data[[k]] == 1, k] <- n
data[[k]]
})
data
Test data.
set.seed(2021)
data <- replicate(6, rbinom(10, 1, 0.5))
data <- as.data.frame(data)
names(data) <- paste0("x", 1:6)
A solution based on a simple for loop is below (otherwise similar to the accepted answer using lapply):
for (i in 2:100) {
k <- paste0('x', i)
data[data[[k]] == 1, k] <- i
}

Better way to adding elements in data frame without looping in R

I want to create a dataframe that calculates the odds ratio with the standard error and confidence intervals in R.
I have a dataset similar to the one like so:
dat <- read.table(header = TRUE, text = "
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 target
0 0 1 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 0 1
1 1 1 1 1 0 0 0 0 0 0 0")
And create a function that calculates everything I need in the dataframe for a particular future in the data set like so:
get_ci <- function(df, feature) {
tab <- table(df[[feature]], df$target)
a <- tab[1,1]
b <- tab[1,2]
c <- tab[2,1]
d <- tab[2,2]
odds_ratio <- (a/b)/(c/d)
standard_error <- sqrt(1/a + 1/b + 1/c + 1/d)
log_ci_lower <- log(odds_ratio) - 1.96 + standard_error
log_ci_upper <- log(odds_ratio) - 1.96 + standard_error
ci_lower <- exp(log_ci_lower)
ci_upper <- exp(log_ci_upper)
df <- data.frame(Feature = feature,
`Odds Ratio` = odds_ratio,
`Standard Error` = standard_error,
`Lower Bound CI` = ci_lower,
`Upper Bound CI` = ci_upper
)
}
I want to create a DF that computes the odds ratio, standard error, and confidence interval for each features (f1-f11). What is the most efficient way to do this?
I am currently creating an empty dataframe and looping through the features in the df to populate one but I feel like this is not the right way to do it. I was looking at the apply functions, but not sure how I can apply that with my function I created
I think the first table line in the function should be :
tab <- table(factor(df[[feature]], levels = 0:1), df$target)
otherwise, if you have all 1's and all 0's in a particular column the next lines would break.
With that change, you can use lapply passing the column names
result <- do.call(rbind, lapply(paste0('f', 1:11), get_ci, df = dat))
Or using purrr's map_df
result <- map_df(paste0('f', 1:11), get_ci, df = dat)
Here's another solution.
get_ci <- function(x, target) {
tab <- table(factor(x, levels=0:1), target) #changed
...
ci_upper <- exp(log_ci_upper)
c(`Odds Ratio` = odds_ratio, # changed
`Standard Error` = standard_error,
`Lower Bound CI` = ci_lower,
`Upper Bound CI` = ci_upper
)
}
as.data.frame(apply(dat[,1:11], 2, function(x) { get_ci(x, dat$target) })) #changed

Binary Data heatmap

Can anyone tell me how to plot a heatmap for binary data, similar to heatmap in this link- Binary R heatmap still displays gradient ,
I tried to do, but I suppose I am not able to give the file input properly. Here is the link to the data I want to plot-
https://www.dropbox.com/s/7k1uskwrfuaugw3/Dataset.csv?dl=0
Here is a subset of my data-
Strains gene1 gene2 gene3 gene4 gene5
strain1 1 1 1 1 1
strain2 1 1 1 1 1
strain3 1 1 1 1 1
strain4 1 1 1 1 1
strain5 1 1 1 1 1
strain6 1 1 1 1 1
strain7 1 1 1 1 1
strain8 1 1 1 1 1
strain9 1 1 0 0 0
And the output I am getting:
library(gplots)
file1<- read.csv('Dataset.csv',header = T)
class(file1)
dat <- data.frame(file1)
dim(dat)
names(dat)
head(dat)
rownames(dat) <-dat$Strains
head(dat)
dim(dat)
head(dat)
dat.tdy <- dat[,2:26]
dat.n <- scale(t(dat.tdy))
dat.tn <- t(dat.n)
col = c("black", "grey")
row_names <- rownames(dat.tn)
heatmap.2(dat.tn, scale = "none", Rowv = NA, Colv = NA, col = c("black", "grey"), margin=c(6, 4),trace='none',labRow = row_names,
lhei=c(1,4),cexRow = 1,cexCol = 1,
lwid=c(.1,1), keysize=0.1, key.par = list(cex=0.5), sepwidth=c(0.1,0.1),
sepcolor="white",
colsep=1:ncol(dat),
rowsep=1:nrow(dat))
This code is running properly and giving an output, but when I cross check with the input file, I see that the color matrix in heatmap is different than the input file. For example in the heatmap, for gene7 there is only one black box, but it actually has almost 13 zeros in the input file.
I feel there is a simpler way to do it..but as I am new to R, I am not able to figure it out. I am doing something wrong in giving the input file. Please help.
Thanks
Pleas try this code you have made some error in the dimension
library(gplots)
file1<- read.csv('Dataset.csv',header = T,row.names = 1)
class(file1)
dat <- data.frame(file1)
dim(dat)
names(dat)
head(dat)
rownames(dat) <-dat$Strains
head(dat)
dim(dat)
head(dat)
dat.tdy <- dat[,1:25]
dat.n <- scale(t(dat.tdy))
dat.tn <- t(dat.n)
col = c("black", "grey")
row_names <- rownames(dat.tn)
heatmap.2(dat.tn, scale = "none", Rowv = NA, Colv = NA, col = c("black", "grey"), margin=c(6, 4),trace='none',labRow = FALSE,
lhei=c(1,4),cexRow = 1,cexCol = 1,
lwid=c(.1,1), keysize=0.1, key.par = list(cex=0.5), sepwidth=c(0.1,0.1),
sepcolor="white",
colsep=1:ncol(dat),
rowsep=1:nrow(dat))
I found a much simple code to plot the binary heatmap-
library(d3heatmap)
x<- read.csv("Dataset.csv", header = T, row.names = 1)
d3heatmap(x, Colv = NA,Rowv = NA, col = c("blue", "red"), scale="none", cexRow = 0.6,cexCol = 1)
Example dataset used-
RC C1 C2 C3 C4
R1 1 1 0 1
R2 0 1 1 0
R3 0 1 1 1
R4 1 1 1 0
R5 1 1 1 1
R6 0 0 0 1
R7 1 1 1 1
R8 1 1 1 1
R9 0 1 1 1
R10 1 1 0 0

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

How do I read data from txt file to create a 2D matrix?

I am writing a function in R which reads a text file with some header information in the first 8 lines and then the actual data starts. Here is how the file looks:
Line 1 to
.....
Line 10 (header information)
0 0 4.169080e+000
1 0 6.391669e+000
2 0 6.391669e+000
.
.
.
511 0 9.922547e+000
0 1 5.268026e+000
1 1 5.268026e+000
.
.
.
511 511 9.922547e+000
I have extracted information from the lines which are part of the header. Line 9 onwards the line format is:
x y value
I want to read all these lines one by one (line 11 onwards) and form a 2D matrix(dimensions: 511 X 511) of the value column so that later I can generate an image using this. Can someone help me how I organize this in a matrix? I am trying to use a yLoop and a nested xLoop but it is not working.
can't you just create a matrix with 511 lines?
v <- rnorm(511*511, 3, 1)
matrix(v, nrow = 511, ncol = 511)
for something smaller
v<- rnorm(4*4, 3,1)
> matrix(v, nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1.944165 4.263226 2.700559 3.672780
[2,] 3.932594 1.742278 3.733877 3.115301
[3,] 5.224144 1.139828 2.979448 3.402496
[4,] 3.619015 4.248993 2.667306 2.168456
library(raster)
rv <- raster(matrix(v, nrow = 4, byrow = F)) # you'll want it with the default byrow = F
plot(rv)
If you have something like this
x y v
0 0 4.169080e+000
1 0 6.391669e+000
2 0 6.391669e+000
511 0 9.922547e+000
0 1 5.268026e+000
1 1 5.268026e+000
511 511 9.922547e+000
and reading it just from the clipboard
v <- read.table(text=readClipboard(), header=T)
you'll get something like this
> d
x y v
1 0 0 4.169080
2 1 0 6.391669
3 2 0 6.391669
4 511 0 9.922547
5 0 1 5.268026
6 1 1 5.268026
7 511 511 9.922547
and d$v will be you data to plot.
You can handle raster resolution and coordinate reference system.
You possibly have a reference layer x to read resolution and crs I suppose. If so, use something like
v <- matrix(d$v, nrow = 4, byrow = F)
rv <- raster(v, xmn=x#extent#xmin, ymn=x#extent#ymin,
xmx=x#extent#xmax, ymx=x#extent#ymax,
crs = CRS(proj4string(x)))
With your data:
myfolder <- 'D:/temp'
d <- read.table(file.path(myfolder, 'sample.txt'), header = F, skip = 9, sep = '')
> head(d)
V1 V2 V3
1 0 0 12
2 1 0 7
3 2 0 10
4 3 0 11
5 4 0 8
6 5 0 9
rv <- raster(nrows=100, ncols=100)
rv[] <- matrix(d$V3, nrow = 100, byrow = F)
plot(rv)
Considering that the image is 1 x 1mm, you could try
rv1 <- raster(matrix(d$V3, nrow = 100, byrow = T), # I'm changing how d$V3 is arranged
xmn=0, ymn=0,
xmx=1, ymx=1)
spplot(rv1, scales = list(draw = TRUE))

Resources