Tab Delimited to Square Matrix - r

I have a tab delimited file like
A B 0.5
A C 0.75
B D 0.2
And I want to convert it to a square matrix, like
A B C D
A 0 0.5 0.75 0
B 0 0 0.2
C 0 0
D 0
How can I go about it in R?
Thanks,

If you have the data in a data frame with the following column names:
Var1 Var2 value
you can use
xtabs(value ~ Var1 + Var2, data = df)
See the plyr package for some more general data reshaping functions also.

Another approach (not as elegant as JoFrhwld's)
df<- read.table(textConnection("
Var1 Var2 value
A B 0.5
A C 0.75
B D 0.2
"),header = T)
lev = unique(c(levels(df$Var1),levels(df$Var2)))
A = matrix(rep(0,length(lev)^2),nrow=length(lev))
colnames(A) = lev
rownames(A) = lev
apply(df,1,function(x) A[x[1],x[2]]<<-as.numeric(x[3]))
> A
A B C D
A 0 0.5 0.75 0.0
B 0 0.0 0.00 0.2
C 0 0.0 0.00 0.0
D 0 0.0 0.00 0.0
>

I'm guessing this is a weighted adjacency matrix for a graph. If so, you might be interested in the igraph package, to read the data as a weighted edge list.

Related

How to calculate values in an R dataframe when columns are dependent on each other

First time poster - so apologies if this question is basic/poorly explained. Grateful to anyone can help or point me in the right direction!
I would like to do the following within an R dataframe if possible:
Existing Data
Column A is a vector of values, say 10 to 20.
New Data/columns
Column B will be be Column A multiplied by Column C
Column C will be Column C minus Column B from the previous row of data, i.e data$C[-1] - data$B[-1], apart the first row of Column C of course, which I will give a fixed value.
I have tried these as separate steps, but I keep overwriting columns B and C, and have a feeling I have been going about this the wrong way! I could share my code, but I think this would confuse matters!
Thanks in advance!
EDIT TO ADD CODE:
A <- c(0.1,0.2,0.3,0.4,0.5)
df1 <- data.frame(A)
df1$B <- 0
df1$C <- 0
df1$C[1] <- 100
df2 <- df1 %>%
mutate(B = C * A,
C = lag(C-B))
RESULT FROM THE ABOVE
A
B
C
1
0.1
10
NA
2
0.2
0
90
3
0.3
0
0
4
0.4
0
0
5
0.5
0
0
EXPECTED OUTPUT
A
B
C
1
0.1
10
100
2
0.2
18
90
3
0.3
21.6
72
4
0.4
20.16
50.4
5
0.5
15.12
30.24
C2 = C1 - B1
B2 = C2 * A2
We can use accumulate from purrr to do recursive update
library(dplyr)
library(purrr)
tmp <- with(df1, accumulate(A, ~ .x - (.x * .y), .init = first(C)))
df2 <- df1 %>%
mutate(C = head(tmp, -1), B = -diff(tmp))
df2
# A B C
#1 0.1 10.00 100.00
#2 0.2 18.00 90.00
#3 0.3 21.60 72.00
#4 0.4 20.16 50.40
#5 0.5 15.12 30.24
Or use base R
tmp <- with(df1, Reduce(function(x, y) x - (x * y), A,
accumulate = TRUE, init = C[1]))
df2 <- transform(df1, C = head(tmp, -1), B = -diff(tmp))
If you don't mind using a mathematical approach, you can first derive the general expression for the recursion and then have the R code afterwards.
Below is one implementation with base R
transform(
transform(
df1,
C = C[1] * c(1, cumprod(1 - A)[-nrow(df1)])
),
B = A * C
)
which gives
A B C
1 0.1 10.00 100.00
2 0.2 18.00 90.00
3 0.3 21.60 72.00
4 0.4 20.16 50.40
5 0.5 15.12 30.24
A data.table option in a similar manner is
> setDT(df1)[, C := first(C) * c(1, cumprod(1 - A)[-.N])][, B := A * C][]
A B C
1: 0.1 10.00 100.00
2: 0.2 18.00 90.00
3: 0.3 21.60 72.00
4: 0.4 20.16 50.40
5: 0.5 15.12 30.24

How to collect outputs of multivariable vector-valued function into a dataframe?

I have a function f1 that take a pair of real numbers (x, y) and returns a triple of real numbers. I would like to collect all outputs of this function for all x in a vector a and y in a vector b. Could you please elaborate on how to do so?
f1 <- function(x, y){
return (c(x+y, x-y, x*y))
}
a <- seq(0, pi, 0.1)
b <- seq(0, 2 * pi, 0.1)
Update: I mean for all pair $(x, y) \in a \times b$.
Here is a data.table option
setDT(expand.grid(a, b))[, fval := do.call(Vectorize(f1, SIMPLIFY = FALSE), unname(.SD))][]
where expand.grid + do.call + Vectorize are used, giving
Var1 Var2 fval
1: 0.0 0.0 0,0,0
2: 0.1 0.0 0.1,0.1,0.0
3: 0.2 0.0 0.2,0.2,0.0
4: 0.3 0.0 0.3,0.3,0.0
5: 0.4 0.0 0.4,0.4,0.0
---
2012: 2.7 6.2 8.90,-3.50,16.74
2013: 2.8 6.2 9.00,-3.40,17.36
2014: 2.9 6.2 9.10,-3.30,17.98
2015: 3.0 6.2 9.2,-3.2,18.6
2016: 3.1 6.2 9.30,-3.10,19.22
A more compact one is using CJ(a,b) instead of setDT(expand.grid(a, b)) (Thank #akrun's advise)
We can use expand.grid to expand the data between 'a', and 'b' values, then loop over the row with apply, MARGIN = 1 and apply the f1
out <- as.data.frame(t(apply(expand.grid(a, b), 1, function(x) f1(x[1], x[2]))))
Or with tidyverse
library(dplyr)
library(purrr)
library(tidyr)
out2 <- crossing(x = a, y = b) %>%
pmap_dfr(f2)
-output
head(out2)
# A tibble: 6 x 3
# add subtract multiply
# <dbl> <dbl> <dbl>
#1 0 0 0
#2 0.1 -0.1 0
#3 0.2 -0.2 0
#4 0.3 -0.3 0
#5 0.4 -0.4 0
#6 0.5 -0.5 0
where f2
f2 <- function(x, y){
return (tibble(add = x+y, subtract = x-y, multiply = x*y))
}
It may be better to return a list or tibble so that it becomes easier
Create all possible combinations with expand.grid and use Map to apply f1 to every pair.
val <- expand.grid(a, b)
result <- do.call(rbind, Map(f1, val$Var1, val$Var2))
head(result)
# [,1] [,2] [,3]
#[1,] 0.0 0.0 0
#[2,] 0.1 0.1 0
#[3,] 0.2 0.2 0
#[4,] 0.3 0.3 0
#[5,] 0.4 0.4 0
#[6,] 0.5 0.5 0

convert from long to symmetrical square wide format in R

I would like to convert this dataframe
tmp <- data.frame(V1=c("A","A","B"),V2=c("B","C","C"),V3=c(0.2,0.4,0.1))
tmp
V1 V2 V3
1 A B 0.2
2 A C 0.4
3 B C 0.1
into a square matrix like this (which should ultimately be a dist object
A B C
A 0
B 0.2 0
C 0.4 0.1 0
I tried different approaches based on functions reshape, spread or xtabs but I cannot get the right dimension. Thanks for your help.
Maybe you can try the code below
d <- sort(unique(unlist(tmp[1:2])))
m <- `dimnames<-`(matrix(0,length(d),length(d)),list(d,d))
m[as.matrix(tmp[1:2])] <- tmp$V3
res <- t(m) + m
such that
> res
A B C
A 0.0 0.2 0.4
B 0.2 0.0 0.1
C 0.4 0.1 0.0
You can also create your own dist object this way using structure:
tmp_lab <- unique(c(as.character(tmp$V1), as.character(tmp$V2)))
structure(tmp$V3,
Size = length(tmp_lab),
Labels = tmp_lab,
Diag = TRUE,
Upper = FALSE,
method = "user",
class = "dist")
Output
A B C
A 0.0
B 0.2 0.0
C 0.4 0.1 0.0
Here is an option with xtabs after converting the columns 'V1' , 'V2' to factor with levels specified as the same
tmp[1:2] <- lapply(tmp[1:2], factor, levels = c('A', 'B', 'C'))
as.dist(xtabs(V3 ~ V2 + V1, tmp), diag = TRUE)
# A B C
#A 0.0
#B 0.2 0.0
#C 0.4 0.1 0.0

dataframe column wise subtraction and division.

need help in N number or column wise subtraction and division, Below are the columns in a input dataframe.
input dataframe:
> df
A B C D
1 1 3 6 2
2 3 3 3 4
3 1 2 2 2
4 4 4 4 4
5 5 2 3 2
formula - a, (b - a) / (1-a)
MY CODE
ABC <- cbind.data.frame(DF[1], (DF[-1] - DF[-ncol(DF)])/(1 - DF[-ncol(DF)]))
Expected out:
A B C D
1 Inf -1.5 0.8
3 0.00 0.0 -0.5
1 Inf 0.0 0.0
4 0.00 0.0 0.0
5 0.75 -1.0 0.5
But i dont want to use ncol here, cause there is a last column after column D in the actual dataframe.
So want to apply this formula only till first 4 column, IF i use ncol, it will traverse till last column in the dataframe.
Please help thanks.
What about trying:
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
df_2 <- matrix((df[,2]-df[,1])/(1-df[,1]),5,1)
df_3 <- matrix((df[,3]-df[,2])/(1-df[,2]),5,1)
df_4 <- matrix((df[,4]-df[,3])/(1-df[,3]),5,1)
cbind(df[,1],df_2,df_3,df_4)
edit: a loop version
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
test_bind <- c()
test_bind <- cbind(test_bind, df[,1])
for (i in 1:3){df_1 <- matrix((df[,i+1]-df[,i])/(1-df[,i]),5,1)
test_bind <- cbind(test_bind,df_1)}
test_bind
here is one option with tidyverse
library(dplyr)
library(purrr)
map2_df(DF[2:4], DF[1:3], ~ (.x - .y)/(1- .y)) %>%
bind_cols(DF[1], .)
# A B C D
#1 1 Inf -1.5 0.8
#2 3 0.00 0.0 -0.5
#3 1 Inf 0.0 0.0
#4 4 0.00 0.0 0.0
#5 5 0.75 -1.0 0.5

Reordering rows and columns in R

I know this has been answered before, but,
given a correlation matrix which looks like this:
V A B C D
A 1 0.3 0.1 0.4
B 0.2 1 0.4 0.3
C 0.1 0 1 0.9
D 0.3 0.3 0.1 1
which can be loaded in R as follows:
corr.matrix <- read.table("path/to/file", sep = '\t', header = T)
rownames(corr.matrix) <- corr.matrix$V
corr.matrix <- corr.matrix[, 2:ncol(corr.matrix)]
Based on 2 other files that dictate which of the rows and columns to be plotted (Because some are of no interest to me), I want to rearrange the rows and columns in to how the 2 separate files dictate.
For example:
cols_order.txt
C
D
E
B
A
...
rows.txt
D
E
Z
B
T
A
...
I read those other 2 files like this:
rows.order <- ("rows_order.txt", sep = '\n', header=F)
colnames(rows.order) <- "Variant"
cols.order <- ("cols_order.txt", sep = '\n', header=F)
colnames(cols.order) <- "Variant"
And after this step I do this:
corr.matrix <- corr.matrix[rows.order$Variant, cols.order$Variant]
The values that I don't want to be plotted are successfully removed, but the order gets scrambled. How can I fix this?
The .order datasets are read correctly (I checked 3 times).
Here is a potential solution to your question. I tried to re-create a small-sized data.frame based on your question. The key here is the match function as well as some basic subsetting/filtering techniques in R:
## Re-create your example:
V <- data.frame(
A = c(1 , 0.3, 0.1 , 0.4),
B = c(0.2, 1 , 0.4 , 0.3),
C = c(0.1, 0 , 1 , 0.9),
D = c(0.3, 0.3, 0.1 , 1)
) #matrix() also ok
rownames(V) <- LETTERS[1:4]
## Reorder using `match` function
## Needs to be in data.frame form
## So use as.data.frame() if needed
## Here, I don't have the text file
## So if you want to load in txt files specifying rows columns
## Use `read.csv` or `read.table to load
## And then store the relevant info into a vector as you did
col_order <- c("C","D","E","B","A")
col_order_filtered <- col_order[which(col_order %in% colnames(V))]
rows <- c("D","E","Z","B","T","A")
## Filter rows IDs, since not all are present in your data
row_filtered <- rows[rows %in% rownames(V)]
V1 <- V[match(rownames(V), row_filtered), match(colnames(V), col_order_filtered)]
V1 <- V1[-which(rownames(V1)=="NA"), ]
V1
## D C A B
## C 0.1 1.0 0.1 0.4
## B 0.3 0.0 0.3 1.0
## A 0.3 0.1 1.0 0.2
Alternatively, if you are comfortable with dplyr package and the syntax, you can use it and often it is handy:
## Continued from previous code
library(dplyr)
V2 <- V %>%
select(C, D, B, A, everything()) %>%
slice(match(rownames(V), row_filtered))
rownames(V2) <- row_filtered
V2
## C D B A
## D 1.0 0.1 0.4 0.1
## B 0.0 0.3 1.0 0.3
## A 0.1 0.3 0.2 1.0
Hope that helps.

Resources