Two dimensional heatmap with R - r

I have an input file of this form:
0.35217720 1 201 1
0.26413283 1 209 1
1.1665874 1 210 1
...
0.30815500 2 194 1
0.15407741 2 196 1
0.15407741 2 197 1
0.33016610 2 205 1
...
where the first column is a scalar value, the second is the x coordinate of a discrete lattice, the third is the y coordinate and the last one is time-like discrete component.
I would like to make a two dimensional heatmap of the scalar values at fixed time. How can i do? Edit: I don't know how to use image() to use the second and the third column as x, y coordinates.
Example file:
7.62939453 1 1 1
1.3153768 1 2 1
7.5560522 1 3 1
4.5865011 1 4 1
5.3276706 1 5 1
2.1895909 2 1 1
0.47044516 2 2 1
6.7886448 2 3 1
6.7929626 2 4 1
9.3469286 2 5 1
3.8350201 3 1 1
5.1941633 3 2 1
8.3096523 3 3 1
0.34571886 3 4 1
0.53461552 3 5 1
5.2970004 4 1 1
6.7114925 4 2 1
7.69805908 4 3 1
3.8341546 4 4 1
0.66842079 4 5 1
4.1748595 5 1 1
6.8677258 5 2 1
5.8897662 5 3 1
9.3043633 5 4 1
8.4616680 5 5 1

Reshape your data to a matrix and then use heatmap():
This worked on R version 2.10.1 (2009-12-14):
txt <- textConnection("7.62939453 1 1 1
1.3153768 1 2 1
7.5560522 1 3 1
4.5865011 1 4 1
5.3276706 1 5 1
2.1895909 2 1 1
0.47044516 2 2 1
6.7886448 2 3 1
6.7929626 2 4 1
9.3469286 2 5 1
3.8350201 3 1 1
5.1941633 3 2 1
8.3096523 3 3 1
0.34571886 3 4 1
0.53461552 3 5 1
5.2970004 4 1 1
6.7114925 4 2 1
7.69805908 4 3 1
3.8341546 4 4 1
0.66842079 4 5 1
4.1748595 5 1 1
6.8677258 5 2 1
5.8897662 5 3 1
9.3043633 5 4 1
8.4616680 5 5 1
")
df <- read.table(txt)
close(txt)
names(df) <- c("value", "x", "y", "t")
require(reshape)
dfc <- cast(df[ ,-4], x ~ y)
heatmap(as.matrix(dfc))

## Some copy/pasteable fake data for you (dput() works nicely for pasteable real data)
your_matrix <- cbind(runif(25, 0, 10), rep(1:5, each = 5), rep(1:5, 5), rep(1, 25))
heatmap_matrix <- matrix(your_matrix[, 1], nrow = 5)
## alternatively, if your_matrix isn't in order
## (The reshape method in EDi's answer is a nicer alternative)
for (i in 1:nrow(your_matrix)) {
heatmap_matrix[your_matrix[i, 2], you_matrix[i, 3]]
}
heatmap(heatmap_matrix) # one option
image(z = heatmap_matrix) # another option
require(gplots)
heatmap.2(heatmap_matrix) # this has fancier preferences

Related

How to keep only first value in every sequence of duplicated values in R [duplicate]

This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 5 months ago.
I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
Thanks
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
library(data.table)
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

Creating list of all pairwise comparisons within data frame in R

From a data frame in R that has X Y coordinates (see example) I would like to add to rows (final X and final Y) to show all possible pairwise comparisons between the two.
dt = data.frame(X = seq(1, 5, by=1), Y = seq(1, 5, by=1))
This is the final goal but there should be a row for every possible combination of x, y and final_x, final_y
You can use expand.grid:
eg <- expand.grid(final_Y = 1:5, Y = 1:5, final_X = 1:5, X = 1:5)[,c(4,2,3,1)]
head(eg, n=20)
# X Y final_X final_Y
# 1 1 1 1 1
# 2 1 1 1 2
# 3 1 1 1 3
# 4 1 1 1 4
# 5 1 1 1 5
# 6 1 2 1 1
# 7 1 2 1 2
# 8 1 2 1 3
# 9 1 2 1 4
# 10 1 2 1 5
# 11 1 3 1 1
# 12 1 3 1 2
# 13 1 3 1 3
# 14 1 3 1 4
# 15 1 3 1 5
# 16 1 4 1 1
# 17 1 4 1 2
# 18 1 4 1 3
# 19 1 4 1 4
# 20 1 4 1 5
nrow(eg)
# [1] 625
I defined the columns out of order and reordered them simply to match the ordering of your expected output. One could easily do expand.grid(X=,Y=,final_X=,final_Y=) and leave off the [,c(...)] and the effective results would be the same but in a different row-order.

Looping through specified columns in a Matrix and replacing their values by subtracting the value from 4

I am new(ish) to R and I am still unsure about loops.
If I had a large matrix object in R with columns having values of 0 - 4, and I would like to invert these values for specified columns.
I would use the code:
b[, "AX1"] <- 4 - b[, "AX1"]
Where b is a Matrix extracted from a larger list object and AX1 would be a column in the matrix.
I would then replace the changed Matrix back into its list using the code:
DF1$geno[[1]]$data <- b
How would I loop this code through a list of column names(AX1, AX10, AX42, ...)for about 30 columns of the 5000 columns in the matrix if I used a list with the 30 Column names to be inverted?
The simplest way you can do it (assuming that you always transform it the way x = 4 - x) is to expand your approach to the list of columns:
# Create an example dataset
set.seed(68859457)
(
dat <- matrix(
data = sample(x = 0:4, size = 100, replace = TRUE),
nrow = 10,
dimnames = list(1:10, paste('AX', 1:10, sep = ''))
)
)
# AX1 AX2 AX3 AX4 AX5 AX6 AX7 AX8 AX9 AX10
# 1 2 1 2 3 2 2 3 1 0 4
# 2 4 3 4 4 0 1 3 1 3 4
# 3 3 0 3 4 2 2 4 1 2 1
# 4 2 2 0 2 4 2 2 1 1 0
# 5 4 4 4 3 3 1 0 3 2 2
# 6 2 1 1 0 3 3 4 4 1 0
# 7 2 3 1 3 3 1 0 1 0 4
# 8 2 2 1 1 0 3 1 3 2 1
# 9 3 1 4 1 2 1 0 0 4 1
# 10 4 3 2 4 1 0 2 0 3 2
# Create a list of columns you want to modify
set.seed(68859457)
(
cols_to_invert <- sort(sample(x = colnames(dat), size = 5))
)
# [1] "AX3" "AX4" "AX5" "AX6" "AX9"
# Use the list of columns created above to modify matrix in place
dat[, cols_to_invert] <- 4 - dat[, cols_to_invert]
# See the result
dat
# AX1 AX2 AX3 AX4 AX5 AX6 AX7 AX8 AX9 AX10
# 1 2 1 2 1 2 2 3 1 4 4
# 2 4 3 0 0 4 3 3 1 1 4
# 3 3 0 1 0 2 2 4 1 2 1
# 4 2 2 4 2 0 2 2 1 3 0
# 5 4 4 0 1 1 3 0 3 2 2
# 6 2 1 3 4 1 1 4 4 3 0
# 7 2 3 3 1 1 3 0 1 4 4
# 8 2 2 3 3 4 1 1 3 2 1
# 9 3 1 0 3 2 3 0 0 0 1
# 10 4 3 2 0 3 4 2 0 1 2
Difficult to tell without knowing exact structure of the data but based on your explanation and attempt maybe this will help.
cols <- c('AX1', 'AX10', 'AX42')
DF1$geno <- lapply(DF1$geno, function(x) {
x$data <- 4 - x$data[, cols]
x
})

How to calculate size and location of multi clusters in matrix (R)

I have a matrix and I would like to know the center and min/max size of each cluster represented by the same number value.
By example, to get the center position and size of clusters (or the min/max column/row) represented by the number 2 in the following matrix. The idea is closed to the one perform on an image How to obtain size of cluster of pixels in R and How to obtain size of multi clusters in matrix (R)
But when I use the function apply(matrix2, 2, mean) and apply(matrix2, 2, range), results merge the two clusters. Is there a way to get each cluster ?
> matrix<- read.csv("2_ind_matrix.csv")
X1 X1.1 X1.2 X1.3 X1.4 X1.5
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
4 1 1 1 2 2 2
5 1 1 1 1 2 2
6 1 1 1 1 1 1
7 1 1 1 1 1 1
8 1 1 1 1 1 1
9 1 1 1 1 1 1
10 1 1 1 1 1 1
11 2 1 1 1 1 1
12 2 1 1 1 1 1
13 2 1 1 1 1 1
14 2 2 1 1 1 1
15 2 2 2 1 1 1
16 2 2 2 2 2 2
17 2 2 2 2 2 2
> matrix2<- which(matrix == 2, TRUE)
> apply(matrix2, 2, range) #Range
row col
[1,] 4 1
[2,] 17 6
> apply(matrix2, 2, mean) #Center
row col
13.16 3.20
The decision on how many clusters are there needs to be done. Here I assume there are 2 clusters. Those can be found by kmeans by using the positions returned from which.
y <- which(x==2, TRUE)
y <- cbind(y, cluster=kmeans(y, 2)$cluster)
aggregate(y[,1:2], list(y[,3]), range)
# Group.1 row.1 row.2 col.1 col.2
#1 1 4 5 4 6
#2 2 11 17 1 6
aggregate(y[,1:2], list(y[,3]), mean)
# Group.1 row col
#1 1 4.40 5.2
#2 2 15.35 2.7

Count with table() and exclude 0's

I try to count triplets; for this I use three vectors that are packed in a dataframe:
X=c(4,4,4,4,4,4,4,4,1,1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
Y=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,3,4,2,2,2,2,3,4,1,1,2,2,3,3,4,4)
Z=c(4,4,5,4,4,4,4,4,6,1,1,1,1,1,1,1,2,2,2,2,7,2,3,3,3,3,3,3,3,3)
Count_Frame=data.frame(matrix(NA, nrow=(length(X)), ncol=3))
Count_Frame[1]=X
Count_Frame[2]=Y
Count_Frame[3]=Z
Counts=data.frame(table(Count_Frame))
There is the following problem: if I increase the value range in the vectors or use even more vectors the "Counts" dataframe quickly approaches its size limit due to the many 0-counts. Is there a way to exclude the 0-counts while generating "Counts"?
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(Count_Frame)), grouped by all the columns (.(X, Y, Z)), we get the number or rows (.N).
library(data.table)
setDT(Count_Frame)[,.N ,.(X, Y, Z)]
# X Y Z N
# 1: 4 1 4 7
# 2: 4 1 5 1
# 3: 1 1 6 1
# 4: 1 1 1 3
# 5: 1 2 1 2
# 6: 1 3 1 1
# 7: 1 4 1 1
# 8: 2 2 2 4
# 9: 2 3 7 1
#10: 2 4 2 1
#11: 3 1 3 2
#12: 3 2 3 2
#13: 3 3 3 2
#14: 3 4 3 2
Instead of naming all the columns, we can use names(Count_Frame) as well (if there are many columns)
setDT(Count_Frame)[,.N , names(Count_Frame)]
You can accomplish this with aggregate:
Count_Frame$one <- 1
aggregate(one ~ X1 + X2 + X3, data=Count_Frame, FUN=sum)
This will calculate the positive instances of table, but will not list the zero counts.
One solution is to create a combination of the column values and count those instead:
library(tidyr)
as.data.frame(table(unite(Count_Frame, tmp, X1, X2, X3))) %>%
separate(Var1, c('X1', 'X2', 'X3'))
Resulting output is:
X1 X2 X3 Freq
1 1 1 1 3
2 1 1 6 1
3 1 2 1 2
4 1 3 1 1
5 1 4 1 1
6 2 2 2 4
7 2 3 7 1
8 2 4 2 1
9 3 1 3 2
10 3 2 3 2
11 3 3 3 2
12 3 4 3 2
13 4 1 4 7
14 4 1 5 1
Or using plyr:
library(plyr)
count(Count_Frame, colnames(Count_Frame))
output
# > count(Count_Frame, colnames(Count_Frame))
# X1 X2 X3 freq
# 1 1 1 1 3
# 2 1 1 6 1
# 3 1 2 1 2
# 4 1 3 1 1
# 5 1 4 1 1
# 6 2 2 2 4
# 7 2 3 7 1
# 8 2 4 2 1
# 9 3 1 3 2
# 10 3 2 3 2
# 11 3 3 3 2
# 12 3 4 3 2
# 13 4 1 4 7
# 14 4 1 5 1

Resources