convert from long to symmetrical square wide format in R - r

I would like to convert this dataframe
tmp <- data.frame(V1=c("A","A","B"),V2=c("B","C","C"),V3=c(0.2,0.4,0.1))
tmp
V1 V2 V3
1 A B 0.2
2 A C 0.4
3 B C 0.1
into a square matrix like this (which should ultimately be a dist object
A B C
A 0
B 0.2 0
C 0.4 0.1 0
I tried different approaches based on functions reshape, spread or xtabs but I cannot get the right dimension. Thanks for your help.

Maybe you can try the code below
d <- sort(unique(unlist(tmp[1:2])))
m <- `dimnames<-`(matrix(0,length(d),length(d)),list(d,d))
m[as.matrix(tmp[1:2])] <- tmp$V3
res <- t(m) + m
such that
> res
A B C
A 0.0 0.2 0.4
B 0.2 0.0 0.1
C 0.4 0.1 0.0

You can also create your own dist object this way using structure:
tmp_lab <- unique(c(as.character(tmp$V1), as.character(tmp$V2)))
structure(tmp$V3,
Size = length(tmp_lab),
Labels = tmp_lab,
Diag = TRUE,
Upper = FALSE,
method = "user",
class = "dist")
Output
A B C
A 0.0
B 0.2 0.0
C 0.4 0.1 0.0

Here is an option with xtabs after converting the columns 'V1' , 'V2' to factor with levels specified as the same
tmp[1:2] <- lapply(tmp[1:2], factor, levels = c('A', 'B', 'C'))
as.dist(xtabs(V3 ~ V2 + V1, tmp), diag = TRUE)
# A B C
#A 0.0
#B 0.2 0.0
#C 0.4 0.1 0.0

Related

How to collect outputs of multivariable vector-valued function into a dataframe?

I have a function f1 that take a pair of real numbers (x, y) and returns a triple of real numbers. I would like to collect all outputs of this function for all x in a vector a and y in a vector b. Could you please elaborate on how to do so?
f1 <- function(x, y){
return (c(x+y, x-y, x*y))
}
a <- seq(0, pi, 0.1)
b <- seq(0, 2 * pi, 0.1)
Update: I mean for all pair $(x, y) \in a \times b$.
Here is a data.table option
setDT(expand.grid(a, b))[, fval := do.call(Vectorize(f1, SIMPLIFY = FALSE), unname(.SD))][]
where expand.grid + do.call + Vectorize are used, giving
Var1 Var2 fval
1: 0.0 0.0 0,0,0
2: 0.1 0.0 0.1,0.1,0.0
3: 0.2 0.0 0.2,0.2,0.0
4: 0.3 0.0 0.3,0.3,0.0
5: 0.4 0.0 0.4,0.4,0.0
---
2012: 2.7 6.2 8.90,-3.50,16.74
2013: 2.8 6.2 9.00,-3.40,17.36
2014: 2.9 6.2 9.10,-3.30,17.98
2015: 3.0 6.2 9.2,-3.2,18.6
2016: 3.1 6.2 9.30,-3.10,19.22
A more compact one is using CJ(a,b) instead of setDT(expand.grid(a, b)) (Thank #akrun's advise)
We can use expand.grid to expand the data between 'a', and 'b' values, then loop over the row with apply, MARGIN = 1 and apply the f1
out <- as.data.frame(t(apply(expand.grid(a, b), 1, function(x) f1(x[1], x[2]))))
Or with tidyverse
library(dplyr)
library(purrr)
library(tidyr)
out2 <- crossing(x = a, y = b) %>%
pmap_dfr(f2)
-output
head(out2)
# A tibble: 6 x 3
# add subtract multiply
# <dbl> <dbl> <dbl>
#1 0 0 0
#2 0.1 -0.1 0
#3 0.2 -0.2 0
#4 0.3 -0.3 0
#5 0.4 -0.4 0
#6 0.5 -0.5 0
where f2
f2 <- function(x, y){
return (tibble(add = x+y, subtract = x-y, multiply = x*y))
}
It may be better to return a list or tibble so that it becomes easier
Create all possible combinations with expand.grid and use Map to apply f1 to every pair.
val <- expand.grid(a, b)
result <- do.call(rbind, Map(f1, val$Var1, val$Var2))
head(result)
# [,1] [,2] [,3]
#[1,] 0.0 0.0 0
#[2,] 0.1 0.1 0
#[3,] 0.2 0.2 0
#[4,] 0.3 0.3 0
#[5,] 0.4 0.4 0
#[6,] 0.5 0.5 0

Normalize blocks/sub-matrices within a matrix

I want to normalize (i.e., 0-1) blocks/sub-matrices within a square matrix based on row/col names. It is important that the normalized matrix correspond to the original matrix. The below code extracts the blocks, e.g. all col/row names == "A" and normalizes it by its max value. How do I put that matrix of normalized blocks back together so it corresponds to the original matrix, such that each single value of the normalized blocks are in the same place as in the original matrix. I.e. you cannot put the blocks together and then e.g. sort the normalized matrix by the original's matrix row/col names.
#dummy code
mat <- matrix(round(runif(90, 0, 50),),9,9)
rownames(mat) <- rep(LETTERS[1:3],3)
colnames(mat) <- rep(LETTERS[1:3],3)
mat.n <- matrix(0,nrow(mat),ncol(mat), dimnames = list(rownames(mat),colnames(mat)))
for(i in 1:length(LETTERS[1:3])){
? <- mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] / max(mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]])
#For example,
mat.n[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] <- # doesn't work
}
UPDATE
Using ave() as #G. Grothendieck suggested works for the blocks, but I'm not sure how it's normalizing beyond that.
mat.n <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
Within block the normalization works, e.g.
mat[rownames(mat)=="A",colnames(mat)=="A"]
A A A
A 13 18 15
A 38 33 41
A 12 18 47
mat.n[rownames(mat.n)=="A",colnames(mat.n)=="A"]
A A A
A 0.2765957 0.3829787 0.3191489
A 0.8085106 0.7021277 0.8723404
A 0.2553191 0.3829787 1.0000000
But beyond that, it looks weird.
> round(mat.n,1)
A B C A B C A B C
A 0.3 0.2 0.1 0.4 0.2 1.0 0.3 0.9 1.0
B 0.9 0.8 0.9 0.4 0.5 0.4 0.4 0.9 0.0
C 0.0 0.4 0.4 0.0 0.8 0.5 0.4 0.9 0.0
A 0.8 0.9 0.5 0.7 0.9 0.6 0.9 0.4 0.4
B 0.1 0.8 0.7 1.0 0.3 0.5 0.1 1.0 0.8
C 0.4 0.0 0.2 0.2 0.2 0.6 1.0 0.4 1.0
A 0.3 0.4 0.3 0.4 0.6 0.8 1.0 1.0 0.3
B 0.6 0.2 0.5 0.9 0.3 0.2 0.9 0.3 1.0
C 0.5 0.9 0.7 1.0 0.4 0.5 1.0 1.0 0.9
In this case, I would expect 3 1s across the whole matrix- 1 for each block. But there're 10 1s, e.g. mat.n[3,2], mat.n[1,9]. I'm not sure how this function normalized between blocks.
UPDATE 2
#Original matrix.
#Suggested solution produces `NaN`
mat <- as.matrix(read.csv(text=",1.21,1.1,2.2,1.1,1.1,1.21,2.2,2.2,1.21,1.22,1.22,1.1,1.1,2.2,2.1,2.2,2.1,2.2,2.2,2.2,1.21,2.1,2.1,1.21,1.21,1.21,1.21,1.21,2.2,1.21,2.2,1.1,1.22,1.22,1.22,1.22,1.21,1.22,2.1,2.1,2.1,1.22
1.21,0,0,0,0,0,0,0,0,292,13,0,0,0,0,0,0,0,0,0,0,22,0,0,94,19,79,0,9,0,126,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,0,0,155,166,0,0,0,0,0,0,4,76,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,201,0,0,79,0,0,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,33,0,91,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,8,0,0,0,0,0,0,0,404,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,37,26,18,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,162,79,1,0,0,0,0,0,0,0,0,10,0,27,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,33,17,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,207,0,0,0,0,1644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,16,17,402,0,0,0,606,0,0,0,0,0,0,0,0,0,0,0,0
1.22,13,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26,0,0,15,0,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,374,6,121,6,21,0,0,0,0
1.1,0,0,0,44,0,0,0,0,0,0,0,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,0,0,0,0,0,0,0,0,0,0
1.1,0,0,0,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,18,0,0,0,0,353,116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,0,5,0
2.2,0,0,0,0,0,0,0,37,0,0,0,0,0,4,0,0,0,36,46,62,0,0,0,0,0,0,0,0,0,0,73,0,0,0,0,0,0,1,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,0,0,0,0,0,0,0,38,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2.2,17,0,23,0,0,0,444,65,0,0,0,0,0,0,0,78,0,0,42,30,15,0,0,0,0,0,0,0,4,0,18,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,75,8,0,0,0,0,0,0,0,87,0,74,0,85,0,0,0,0,0,0,0,0,1,0,19,0,25,0,0,0,0,0,0,0,0,0
2.2,0,0,13,0,0,0,12,20,0,0,0,0,0,0,0,118,0,29,92,0,25,0,0,0,0,0,0,0,0,0,16,0,48,0,0,0,0,0,0,0,0,0
1.21,14,0,1,0,0,0,0,0,17,0,0,0,0,0,0,0,0,0,0,14,0,0,0,0,0,0,0,0,3,0,20,0,0,0,0,0,0,0,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,204,0,0,0,0,0,0,0,133,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,44,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,67,0,0,0,0,0,0,143,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,12,15,0
1.21,79,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,38,26,6,9,0,112,0,0,0,0,0,0,0,0,0,0,0,0
1.21,11,0,0,0,0,17,0,0,49,0,0,0,0,0,0,0,0,0,0,0,0,0,0,28,0,0,0,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,40,0,0,0,0,0,0,0,122,0,0,0,0,0,0,0,0,0,0,0,3,0,0,24,11,0,887,20,0,389,0,0,0,0,0,0,0,0,0,0,0,0
1.21,14,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,50,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,34,0,0,0,0,26,0,0,56,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54,9,297,13,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,39,0,0,0,0,0,0,0,0,25,0,17,12,20,25,0,0,0,0,0,0,0,0,0,393,0,7,0,0,0,0,0,0,0,0,0
1.21,177,0,0,0,0,8,0,0,775,0,0,0,0,0,0,0,0,0,0,0,0,0,0,113,0,227,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,21,17,0,0,0,0,0,0,0,0,0,42,30,16,0,0,0,0,0,0,0,0,165,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,6,0,28,0,0,0,0,0,0,0,9,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,4,37,0,0,0,0,0,0,0,0,3,0,0,0,0,14,7,0,0,18,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,44,785,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,21,0,44,177,13,24,0,0,0,0
1.22,0,0,0,0,0,0,30,0,0,182,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,12,0,1231,135,17,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73,1308,0,669,16,0,0,0,8
1.21,0,0,0,0,0,0,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,33,197,626,0,44,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,37,12,80,0,0,0,0,16
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,54,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,58,0,1,0,0,0,0,28,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,2,0,0
1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,9,0,0,0,0"))
ids <- read.csv(text=",x
1,1.21
2,1.1
3,2.2
4,1.1
5,1.1
6,1.21
7,2.2
8,2.2
9,1.21
10,1.22
11,1.22
12,1.1
13,1.1
14,2.2
15,2.1
16,2.2
17,2.1
18,2.2
19,2.2
20,2.2
21,1.21
22,2.1
23,2.1
24,1.21
25,1.21
26,1.21
27,1.21
28,1.21
29,2.2
30,1.21
31,2.2
32,1.1
33,1.22
34,1.22
35,1.22
36,1.22
37,1.21
38,1.22
39,2.1
40,2.1
41,2.1
42,1.22")
mat <- mat[,-1]
rownames(mat) <- ids$x
colnames(mat) <- ids$x
ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
Any help is much appreciated, thanks.
Use ave to get the maxima:
mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
For example, there are 9 ones, as expected, and there is one 1 in each block also as expected. (There could be more than 9 if the matrix happened to have multiple maxima in one or more blocks but there shoud not be less than 9.)
set.seed(123)
mat <- matrix(round(runif(90, 0, 50),),9,9)
rownames(mat) <- rep(LETTERS[1:3],3)
colnames(mat) <- rep(LETTERS[1:3],3)
ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
sum(ans == 1)
## [1] 9
# there are no duplicates (i.e. a block showing up more than once) hence
# there is exactly one 1 in each block
w <- which(ans == 1, arr = TRUE)
anyDuplicated(cbind(rownames(mat)[w[, 1]], colnames(mat)[w[, 2]]))
## [1] 0
ADDED
If some blocks are entirely zero (which is the case in UPDATE 2) then you will get NaNs for those blocks. If you want 0s instead for the all-zero blocks try this:
xmax <- function(x) if (all(x == 0)) 0 else x/max(x)
ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = xmax)

Reordering rows and columns in R

I know this has been answered before, but,
given a correlation matrix which looks like this:
V A B C D
A 1 0.3 0.1 0.4
B 0.2 1 0.4 0.3
C 0.1 0 1 0.9
D 0.3 0.3 0.1 1
which can be loaded in R as follows:
corr.matrix <- read.table("path/to/file", sep = '\t', header = T)
rownames(corr.matrix) <- corr.matrix$V
corr.matrix <- corr.matrix[, 2:ncol(corr.matrix)]
Based on 2 other files that dictate which of the rows and columns to be plotted (Because some are of no interest to me), I want to rearrange the rows and columns in to how the 2 separate files dictate.
For example:
cols_order.txt
C
D
E
B
A
...
rows.txt
D
E
Z
B
T
A
...
I read those other 2 files like this:
rows.order <- ("rows_order.txt", sep = '\n', header=F)
colnames(rows.order) <- "Variant"
cols.order <- ("cols_order.txt", sep = '\n', header=F)
colnames(cols.order) <- "Variant"
And after this step I do this:
corr.matrix <- corr.matrix[rows.order$Variant, cols.order$Variant]
The values that I don't want to be plotted are successfully removed, but the order gets scrambled. How can I fix this?
The .order datasets are read correctly (I checked 3 times).
Here is a potential solution to your question. I tried to re-create a small-sized data.frame based on your question. The key here is the match function as well as some basic subsetting/filtering techniques in R:
## Re-create your example:
V <- data.frame(
A = c(1 , 0.3, 0.1 , 0.4),
B = c(0.2, 1 , 0.4 , 0.3),
C = c(0.1, 0 , 1 , 0.9),
D = c(0.3, 0.3, 0.1 , 1)
) #matrix() also ok
rownames(V) <- LETTERS[1:4]
## Reorder using `match` function
## Needs to be in data.frame form
## So use as.data.frame() if needed
## Here, I don't have the text file
## So if you want to load in txt files specifying rows columns
## Use `read.csv` or `read.table to load
## And then store the relevant info into a vector as you did
col_order <- c("C","D","E","B","A")
col_order_filtered <- col_order[which(col_order %in% colnames(V))]
rows <- c("D","E","Z","B","T","A")
## Filter rows IDs, since not all are present in your data
row_filtered <- rows[rows %in% rownames(V)]
V1 <- V[match(rownames(V), row_filtered), match(colnames(V), col_order_filtered)]
V1 <- V1[-which(rownames(V1)=="NA"), ]
V1
## D C A B
## C 0.1 1.0 0.1 0.4
## B 0.3 0.0 0.3 1.0
## A 0.3 0.1 1.0 0.2
Alternatively, if you are comfortable with dplyr package and the syntax, you can use it and often it is handy:
## Continued from previous code
library(dplyr)
V2 <- V %>%
select(C, D, B, A, everything()) %>%
slice(match(rownames(V), row_filtered))
rownames(V2) <- row_filtered
V2
## C D B A
## D 1.0 0.1 0.4 0.1
## B 0.0 0.3 1.0 0.3
## A 0.1 0.3 0.2 1.0
Hope that helps.

connect two matrixes by columns and extract sub matrix

I have two matrixes (e.g., A and B). I would like to extract columns of B based on the order of A's first column:
For example
matrix A
name score
a 0.1
b 0.2
c 0.1
d 0.6
matrix B
a d b c g h
0.1 0.2 0.3 0.4 0.6 0.2
0.2 0.1 0.4 0.7 0.1 0.1
...
I want matrix B to look like this at the end
matrix B_modified
a b c d
0.1 0.3 0.4 0.2
0.2 0.4 0.7 0.1
Can this be done either in perl or R? thanks a lot in advance
I've no idea what problems you're facing. Here's how I've done it.
## get data as matrix
a <- read.table(header=TRUE, text="name score
a 0.1
b 0.2
c 0.1
d 0.6", stringsAsFactors=FALSE) # load directly as characters
b <- read.table(header=TRUE, text="a d b c g h
0.1 0.2 0.3 0.4 0.6 0.2
0.2 0.1 0.4 0.7 0.1 0.1", stringsAsFactors=FALSE)
a <- as.matrix(a)
b <- as.matrix(b)
Now subset to get your final result:
b[, a[, "name"]]
# a b c d
# [1,] 0.1 0.3 0.4 0.2
# [2,] 0.2 0.4 0.7 0.1
The error :
[.data.frame(b, , a[, "name"]) : undefined columns selected
means that you try to get a column non defined in b but exist in a$name. One solution is to use intersect with colnames(b). This will convert also the factor to a string and you get the right order.
b[, intersect(a[, "name"],colnames(b))] ## the order is important here
For example , I test this with this data:
b <- read.table(text='
a d b c
0.1 0.2 0.3 0.4
0.2 0.1 0.4 0.7',header=TRUE)
a <- read.table(text='name score
a 0.1
z 0.5
c 0.1
d 0.6',header=TRUE)
b[, intersect(a[, "name"],colnames(b))]
a c d
1 0.1 0.4 0.2
2 0.2 0.7 0.1
If your data originates as an R data structure then it would be perverse to export it and solve this problem using Perl. However, if you have text files that look like the data you have shown, then here is a Perl solution for you.
I have split the output on spaces. That can be changed very simply if necessary.
use strict;
use warnings;
use autodie;
sub read_file {
my ($name) = #_;
open my $fh, '<', $name;
my #data = map [ split ], <$fh>;
\#data;
}
my $matrix_a = read_file('MatrixA.txt');
my #fields = map $matrix_a->[$_][0], 1 .. $#$matrix_a;
my $matrix_b = read_file('MatrixB.txt');
my #headers = #{$matrix_b->[0]};
my #indices = map {
my $label = $_;
grep $headers[$_] eq $label, 0..$#headers
} #fields;
for my $row (0 .. $#$matrix_b) {
print join(' ', map $matrix_b->[$row][$_], #indices), "\n";
}
output
a b c d
0.1 0.3 0.4 0.2
0.2 0.4 0.7 0.1

Tab Delimited to Square Matrix

I have a tab delimited file like
A B 0.5
A C 0.75
B D 0.2
And I want to convert it to a square matrix, like
A B C D
A 0 0.5 0.75 0
B 0 0 0.2
C 0 0
D 0
How can I go about it in R?
Thanks,
If you have the data in a data frame with the following column names:
Var1 Var2 value
you can use
xtabs(value ~ Var1 + Var2, data = df)
See the plyr package for some more general data reshaping functions also.
Another approach (not as elegant as JoFrhwld's)
df<- read.table(textConnection("
Var1 Var2 value
A B 0.5
A C 0.75
B D 0.2
"),header = T)
lev = unique(c(levels(df$Var1),levels(df$Var2)))
A = matrix(rep(0,length(lev)^2),nrow=length(lev))
colnames(A) = lev
rownames(A) = lev
apply(df,1,function(x) A[x[1],x[2]]<<-as.numeric(x[3]))
> A
A B C D
A 0 0.5 0.75 0.0
B 0 0.0 0.00 0.2
C 0 0.0 0.00 0.0
D 0 0.0 0.00 0.0
>
I'm guessing this is a weighted adjacency matrix for a graph. If so, you might be interested in the igraph package, to read the data as a weighted edge list.

Resources