Reshape dataframe and create similarity matrix

Reshape dataframe and create similarity matrix - r

I have a data table, and I try Reshaping it but it doesn't work, how do I do this:
I have a data table:
Name | Value
-------------
Bob | 8,9,10
------------
Mike | 2,3,4
------------
Sandr| 5,6,7
How do I make this into a list like:
Value | Name
-------------
2 | Mike
3 | Mike
4 | Mike
5 | Sandr
6 | Sandr
7 | Sandr
8 | Bob
9 | Bob
10 | Bob
And then make this list into a matrix like:
2 3 4 5 6 7 8 9 10
-------------------
2 | 1 1 1 0 0 0 0 0 0
3 | 1 1 1 0 0 0 0 0 0
4 | 1 1 1 0 0 0 0 0 0
5 | 0 0 0 1 1 1 0 0 0
6 | 0 0 0 1 1 1 0 0 0
7 | 0 0 0 1 1 1 0 0 0
8 | 0 0 0 0 0 0 1 1 1
9 | 0 0 0 0 0 0 1 1 1
10| 0 0 0 0 0 0 1 1 1

The functions you are looking for are stack and contrasts.
data<-list(bob=c(8,9,10),mike=c(2,3,4),sandr=c(5,6,7))
as.data.frame(data)
bob mike sandr
1 8 2 5
2 9 3 6
3 10 4 7
stack(data)
values ind
1 8 bob
2 9 bob
3 10 bob
4 2 mike
5 3 mike
6 4 mike
7 5 sandr
8 6 sandr
9 7 sandr
df<-stack(data)
contrasts(df$ind,contrasts=FALSE)[df$ind,df$ind]
bob bob bob mike mike mike sandr sandr sandr
bob 1 1 1 0 0 0 0 0 0
bob 1 1 1 0 0 0 0 0 0
bob 1 1 1 0 0 0 0 0 0
mike 0 0 0 1 1 1 0 0 0
mike 0 0 0 1 1 1 0 0 0
mike 0 0 0 1 1 1 0 0 0
sandr 0 0 0 0 0 0 1 1 1
sandr 0 0 0 0 0 0 1 1 1
sandr 0 0 0 0 0 0 1 1 1
You can assign row names and column names and sort if desired
im<-contrasts(df$ind,contrasts=FALSE)[df$ind,df$ind]
rownames(im)<-df$values
colnames(im)<-df$values

res <- read.table(text="Name | Value
Bob | 8,9,10
Mike | 2,3,4
Sandr| 5,6,7", header=TRUE, sep="|")
dres <- data.frame(Value= unlist( strsplit(as.character(res$Value), ",") )
, Name=rep(res$Name, each=3))
dres <- dres[order(as.numeric(as.character(dres$Value))), ]
dres
outer(sort(dres$Value), sort(dres$Value), FUN=function(x,y) dres[x, "Name"] == dres[y,"Name"] )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[3,] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
[7,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE

Related

Filter genes >10 reads in at least 2 replicates

Hello I need to filter genes from this data >10 reads in at least 2 replicates.
PancreasLungDesign
Individual sex age RNA.quality..max10. organ tissue
GTEX-1KXAM-0226-SM-EV7AP GTEX-1KXAM 1 60-69 7.2 Pancreas Pancreas
GTEX-18A67-1726-SM-7KFT9 GTEX-18A67 1 50-59 7.5 Pancreas Pancreas
GTEX-14BMU-0726-SM-73KXS GTEX-14BMU 2 20-29 7.2 Pancreas Pancreas
GTEX-13PVR-0726-SM-5S2PX GTEX-13PVR 2 60-69 7.3 Pancreas Pancreas
GTEX-1211K-1126-SM-5EGGB GTEX-1211K 2 60-69 7.3 Pancreas Pancreas
GTEX-11TT1-0326-SM-5LUAY GTEX-11TT1 1 20-29 7.7 Pancreas Pancreas
GTEX-1KXAM-0426-SM-DHXKG GTEX-1KXAM 1 60-69 9.1 Lung Lung
GTEX-18A67-1126-SM-7KFSB GTEX-18A67 1 50-59 8.5 Lung Lung
GTEX-14BMU-0526-SM-73KW4 GTEX-14BMU 2 20-29 9.2 Lung Lung
GTEX-1211K-0826-SM-5FQUP GTEX-1211K 2 60-69 8.2 Lung Lung
GTEX-11TT1-1626-SM-5EQL7 GTEX-11TT1 1 20-29 8.9 Lung Lung
GTEX-ZYFG-0226-SM-5GIDT GTEX-ZYFG 2 60-69 8.7 Lung Lung
counts
GTEX-Y5V6-0526-SM-4VBRV GTEX-1KXAM-1726-SM-D3LAE GTEX-18A67-0826-SM-7KFTI GTEX-14BMU-0226-SM-5S2QA
ENSG00000243485 0 0 0 0
ENSG00000237613 1 0 0 0
ENSG00000186092 2 2 2 0
ENSG00000238009 1 0 0 12
ENSG00000222623 0 0 0 0
ENSG00000241599 0 0 0 0
ENSG00000236601 0 0 0 0
ENSG00000235146 0 0 0 0
ENSG00000223181 0 0 0 0
ENSG00000237491 214 205 164 108
ENSG00000177757 20 40 57 42
ENSG00000225880 214 114 146 149
ENSG00000230368 2 4 2 0
ENSG00000272438 5 2 11 2
ENSG00000230699 27 37 25 23
ENSG00000241180 0 0 0 0
GTEX-13PVR-0626-SM-5S2RC GTEX-1211K-0726-SM-5FQUW GTEX-1KXAM-0926-SM-CXZKA GTEX-18A67-2626-SM-718AD
ENSG00000243485 0 0 2 7
ENSG00000237613 0 0 1 3
ENSG00000186092 0 0 2 7
ENSG00000238009 0 2 2 2
ENSG00000222623 0 0 0 0
ENSG00000241599 0 0 0 1
ENSG00000236601 1 1 0 5
ENSG00000235146 0 0 0 0
ENSG00000223181 0 0 0 0
ENSG00000237491 100 174 99 116
ENSG00000177757 60 73 27 36
ENSG00000225880 126 221 101 97
ENSG00000230368 0 12 6 6
ENSG00000272438 4 0 3 5
ENSG00000230699 32 10 20 24
ENSG00000241180 0 0 0 0
I tried with these lines of code:
Expressedgenes=counts>10
NumExpressedgenes=apply(Expressedgenes,1,sum)
FilteredCounts=counts[NumExpressedgenes>0,]
But in this way it means that I filter genes for >10 reads but for only 1 replicate right? how can I filter for at least 2?
GTEX-Y5V6-0526-SM-4VBRV GTEX-1KXAM-1726-SM-D3LAE GTEX-18A67-0826-SM-7KFTI GTEX-14BMU-0226-SM-5S2QA
ENSG00000243485 FALSE FALSE FALSE FALSE
ENSG00000237613 FALSE FALSE FALSE FALSE
ENSG00000186092 FALSE FALSE FALSE FALSE
ENSG00000238009 FALSE FALSE FALSE TRUE
ENSG00000222623 FALSE FALSE FALSE FALSE
ENSG00000241599 FALSE FALSE FALSE FALSE
ENSG00000236601 FALSE FALSE FALSE FALSE
ENSG00000235146 FALSE FALSE FALSE FALSE
ENSG00000223181 FALSE FALSE FALSE FALSE
ENSG00000237491 TRUE TRUE TRUE TRUE
ENSG00000177757 TRUE TRUE TRUE TRUE
ENSG00000225880 TRUE TRUE TRUE TRUE
ENSG00000230368 FALSE FALSE FALSE FALSE
ENSG00000272438 FALSE FALSE TRUE FALSE
ENSG00000230699 TRUE TRUE TRUE TRUE
ENSG00000241180 FALSE FALSE FALSE FALSE

If we need at least two columns to be TRUE, create the logical vector with rowSums
i1 <- rowSums(counts>10, na.rm = TRUE) > 1
counts[i1,]

How to remove columns according to simultaneous conditions

I need to delete the columns (from second onwards) having values different than 0 only in the rows which in the first column have specific values (e.g., sp3 and sp5).
My dataset is large, but here it is a small sample of the data.
SP id2324 id8283 id3912 id3912 id1231...
sp.1 0 2 4 1 0
sp.2 12 10 2 3 15
sp.3 0 0 23 0 4
sp.4 2 2 11 19 0
sp.5 0 0 0 0 3
sp.6 3 1 7 3 0
sp.7 0 14 1 0 12
sp.8 1 0 2 6 6
In this small example I would expect the id3912 and id1231 variables to disappear.

We can first select the rows where SP is c("sp.3", "sp.5"), then select columns where there is at least one value different than 0.
cbind(df[1], df[-1][colSums(df[df$SP %in% c("sp.3", "sp.5"), -1] != 0) == 0])
# SP id2324 id8283 id3912.1
#1 sp.1 0 2 1
#2 sp.2 12 10 3
#3 sp.3 0 0 0
#4 sp.4 2 2 19
#5 sp.5 0 0 0
#6 sp.6 3 1 3
#7 sp.7 0 14 0
#8 sp.8 1 0 6
Breaking it down step-by-step
Select rows where SP is c("sp.3", "sp.5")
df[df$SP %in% c("sp.3", "sp.5"), -1]
# id2324 id8283 id3912 id3912.1 id1231
#3 0 0 23 0 4
#5 0 0 0 0 3
Find cells where value is not equal to 0
df[df$SP %in% c("sp.3", "sp.5"), -1] != 0
# id2324 id8283 id3912 id3912.1 id1231
#3 FALSE FALSE TRUE FALSE TRUE
#5 FALSE FALSE FALSE FALSE TRUE
Find columns where all values are 0
colSums(df[df$SP %in% c("sp.3", "sp.5"), -1] != 0) == 0
# id2324 id8283 id3912 id3912.1 id1231
# TRUE TRUE FALSE TRUE FALSE
We then select the columns which are TRUE and cbind them with 1st column.

convert text into binary matrix in R

I have following text, contents are as follows
#------------------
# CONTENTS OF TEXT
#------------------
H01, H04, G02, G06,
H01, H02, G02, H05,
G01, H04, H01
G09, G05
I want to convert this data in to binary matrix. I want the output to be like this
H01 H02 H04 H05 G01 G02 G05 G06 G09
1 0 1 0 0 1 0 1 0
1 1 0 1 0 1 0 0 0
1 0 1 0 1 0 0 0 0
0 0 0 0 0 0 1 0 1
Please help

You can do:
d <- read.table(header=FALSE, sep='§', stringsAsFactors = FALSE, text=
'H01, H04, G02, G06,
H01, H02, G02, H05,
G01, H04, H01
G09, G05')
s <- sort(unique(unlist(strsplit(d$V1, ', *'))))
m <- sapply(s, grepl, x=d$V1, fixed=TRUE)
# > m
# G01 G02 G05 G06 G09 H01 H02 H04 H05
# [1,] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
# [2,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
# [3,] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
# [4,] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
m[] <- as.integer(m)
# > m
# G01 G02 G05 G06 G09 H01 H02 H04 H05
# [1,] 0 1 0 1 0 1 0 1 0
# [2,] 0 1 0 0 0 1 1 0 1
# [3,] 1 0 0 0 0 1 0 1 0
# [4,] 0 0 1 0 1 0 0 0 0

Another idea using #jogo data:
library(dplyr)
library(tidyr)
d %>%
mutate(V1 = stringi::stri_extract_all_words(V1), V2 = 1) %>%
unnest(V1, .id = "id") %>%
spread(V1, V2, fill = 0)
Which gives:
# id G01 G02 G05 G06 G09 H01 H02 H04 H05
#1 1 0 1 0 1 0 1 0 1 0
#2 2 0 1 0 0 0 1 1 0 1
#3 3 1 0 0 0 0 1 0 1 0
#4 4 0 0 1 0 1 0 0 0 0

How to find Z score of each value in row of Table?

I have a table in R, how do I make a value in the row that is greater or equal to a certain number a 1 and the rest of the values a 0. For example, if my special number was 4, then every value that is 4 and above 4 in my table would be 1, and the rest would be zero. For example then this table:
a b c d e
Bill 1 2 3 4 5
Susan 4 1 5 4 2
Malcolm 4 5 6 2 1
Reese 0 0 2 3 8
Would Turn Into
a b c d e
Bill 0 0 0 1 1
Susan 1 0 1 1 0
Malcolm 1 1 1 0 0
Reese 0 0 0 0 1

We can create a logical matrix of TRUE/FALSE and convert to binary format by using +
+(df1>=4)
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1
Just to be clear, when we do the >=, it creates a logical matrix of TRUE/FALSE
df1 >=4
# a b c d e
#Bill FALSE FALSE FALSE TRUE TRUE
#Susan TRUE FALSE TRUE TRUE FALSE
#Malcolm TRUE TRUE TRUE FALSE FALSE
#Reese FALSE FALSE FALSE FALSE TRUE
But, the OP wanted this to be convert it to 1/0. There are many ways to do this by coercing TRUE/FALSE to binary form. One option is
(df1>=4) + 0L
Or
(df1>=4)*1L
Or simply putting a + will do the coercion
+(df1>=4)
According to ?TRUE
Logical vectors are coerced to integer vectors in contexts where a
numerical value is required, with ‘TRUE’ being mapped to ‘1L’,
‘FALSE’ to ‘0L’ and ‘NA’ to ‘NA_integer_’.
We could also wrap with as.integer, but the output will be a vector
as.integer(df1>=4)
#[1] 0 1 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1
If we assign the output back to the original dataset, we can change that dataset and keep its structure
df1[] <- as.integer(df1>=4)
df1
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1

How fill a matrix with 1 and 0 when there is association

I´ve been trying to make a matrix from a data frame in R, without succes. I have the next data frame
Order Object idrA idoA
8001505892 CHR56029398AB 1 1
8001506013 CHR56029398AB 1 2
8001507782 CHR56029398AB 1 3
8001508088 CHR56029398AB 1 4
8001508788 CHR56029398AB 1 5
8001509281 CHR56029398AB 1 6
8001509322 CHR56029398AB 1 7
8001509373 CHR56029398AB 1 8
8001505342 MMRMD343563 2 9
8001506699 MMRMD343563 2 10
8001507102 MMRMD343563 2 11
8001507193 MMRMD343563 2 12
8001508554 MMRMD343563 2 13
8001508654 MMRMD343563 2 14
8001509151 MMRMD343563 2 15
8001509707 MMRMD343563 2 16
8001509712 MMRMD343563 2 17
8001509977 MMRMD343563 2 18
8001510279 MMRMD343563 2 19
8001505342 MMRMD343565 3 9
8001507112 MMRMD343565 3 20
8001507193 MMRMD343565 3 12
8001508554 MMRMD343565 3 13
8001508654 MMRMD343565 3 14
8001509151 MMRMD343565 3 15
8001509707 MMRMD343565 3 16
8001509712 MMRMD343565 3 17
8001509977 MMRMD343565 3 18
8001510279 MMRMD343565 3 19
8001505920 MMRMN146319 4 21
8001506733 MMRMN146319 4 22
8001506929 MMRMN146319 4 23
8001507112 MMRMN146319 4 20
8001507196 MMRMN146319 4 24
8001510302 MMRMN146319 4 25
8001517272 MMRMN146319 4 26
8001506186 MMRMN146320 5 27
8001506733 MMRMN146320 5 22
8001506929 MMRMN146320 5 23
8001507112 MMRMN146320 5 20
8001508638 MMRMN146320 5 28
8001509526 MMRMN146320 5 29
8001505452 SSR664050011 6 30
8001508551 SSR664050011 6 31
8001509229 SSR664050011 6 32
8001510174 SSR664050011 6 33
Where idr are the Id for each object and ido is the Id for each purchase order. So I want to make a matriz with the number of row = N° orders and N° columns= N°object, and fill it with a vector with 1s and 0s, with a 1 when in each order was purchased some of the bjects and 0 if it wasn´t.
Example: the order with ido=20 must have a vector like this (0,0,1,1,1,0).
I hope I could explain clearly, thanks!

You can use xtabs to create a cross table:
Recreate your data:
dat <- read.table(header=TRUE, text="
Order Object idrA idoA
8001505892 CHR56029398AB 1 1
....
8001506013 CHR56029398AB 1 2
8001507782 CHR56029398AB 1 3
8001509229 SSR664050011 6 32
8001510174 SSR664050011 6 33")
Create the cross table:
xtabs(Order ~ idoA + idrA, dat) != 0
idrA
idoA 1 2 3 4 5 6
1 TRUE FALSE FALSE FALSE FALSE FALSE
2 TRUE FALSE FALSE FALSE FALSE FALSE
....
20 FALSE FALSE TRUE TRUE TRUE FALSE
....
32 FALSE FALSE FALSE FALSE FALSE TRUE
33 FALSE FALSE FALSE FALSE FALSE TRUE
To coerce the logical values to numeric values, you can use apply() and as.numeric, but then you have some work left to replace the row names:
apply(xtabs(Order ~ idoA + idrA, dat) != 0, 2, as.numeric)
Or, you can use a little trick by adding 0 to the values. This coerces the logical values to numeric:
(xtabs(Order ~ idoA + idrA, dat) != 0) + 0
idrA
idoA 1 2 3 4 5 6
1 1 0 0 0 0 0
2 1 0 0 0 0 0
3 1 0 0 0 0 0
....

Another option is to use acast from reshape2
library(reshape2)
res1 <- (acast(dat, idoA~idrA, value.var='Order', fill=0)!=0)+0
head(res1)
# 1 2 3 4 5 6
#1 1 0 0 0 0 0
#2 1 0 0 0 0 0
#3 1 0 0 0 0 0
#4 1 0 0 0 0 0
#5 1 0 0 0 0 0
#6 1 0 0 0 0 0
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
dat %>%
select(-Object) %>%
spread(idrA, Order, fill=0) %>%
mutate_each(funs((!!.)+0), select=-idoA) %>%
head()
#idoA 1 2 3 4 5 6
#1 1 1 0 0 0 0 0
#2 2 1 0 0 0 0 0
#3 3 1 0 0 0 0 0
#4 4 1 0 0 0 0 0
#5 5 1 0 0 0 0 0
#6 6 1 0 0 0 0 0

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Reshape dataframe and create similarity matrix - r

Related

Filter genes >10 reads in at least 2 replicates

How to remove columns according to simultaneous conditions

convert text into binary matrix in R

How to find Z score of each value in row of Table?

How fill a matrix with 1 and 0 when there is association

Categories

Resources