Operation of matrix in R based on dimension names - r

I have a symmetric matrix s defined as:
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
dimnames(s) <- list(LETTERS[1:5], LETTERS[1:5])
s
A B C D E
A 1 6 11 16 21
B 6 7 12 17 22
C 11 12 13 18 23
D 16 17 18 19 24
E 21 22 23 24 25
In addition, there is a vector t defined as:
t <- seq(1,10)
names(t) <- c('C_A', 'E_A', 'E_B', 'E_C', 'E_D', 'D_A', 'D_B', 'D_C', 'C_B', 'A_B')
Now I would like to add the elements of t to the upper and lower triangular elements of s in such a way that the element of t with the name 'C_A' is added to the elements of s with row and column names of 'C' and 'A' (or 'A' and 'C'), the element of t with the name 'E_A' is added to the elements of s with row and column names of 'E' and 'A' (or 'A' and 'E'), etc. For example, both s['A','B'] and s['B','A'] should be added by t['A_B'], and similarly for all other off-diagonal elements. Do nothing for the diagonals.
What is an elegant way to achieve this?

This is not especially elegant but:
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
dimnames(s) <- list(LETTERS[1:5], LETTERS[1:5])
t <- seq(1,10)
names(t) <- c('C_A', 'E_A', 'E_B', 'E_C', 'E_D', 'D_A', 'D_B', 'D_C', 'C_B', 'A_B')
s[t(do.call(cbind, strsplit(names(t), split = "_")))] <-
s[t(do.call(cbind, strsplit(names(t), split = "_")))] + t
s
#> A B C D E
#> A 1 16 11 16 21
#> B 6 7 12 17 22
#> C 12 21 13 18 23
#> D 22 24 26 19 24
#> E 23 25 27 29 25
To add the [i,j]th elements just call it again with the index positions reversed
s[t(do.call(cbind, strsplit(names(t), split = "_")))[,2:1]] <-
s[t(do.call(cbind, strsplit(names(t), split = "_")))[,2:1]] + t
s
#> A B C D E
#> A 1 16 12 22 23
#> B 16 7 21 24 25
#> C 12 21 13 26 27
#> D 22 24 26 19 29
#> E 23 25 27 29 25

Use outer to create row/col and col/row indexes, then overwrite the corresponding values of s:
sel1 <- match(names(t), outer(rownames(s),colnames(s), function(x,y) paste(x,y,sep="_")))
sel2 <- match(names(t), outer(rownames(s),colnames(s), function(x,y) paste(y,x,sep="_")))
s[sel1] <- s[sel1]+t
s[sel2] <- s[sel2]+t
# A B C D E
#A 1 16 12 22 23
#B 16 7 21 24 25
#C 12 21 13 26 27
#D 22 24 26 19 29
#E 23 25 27 29 25

Err: Will fix: You can use matrix indexing with a two-column matrix of character values for row and col:
nt <- strsplit(names(t), "_")
dnt <- data.frame(n=t, t(data.frame(nt)))
s[ as.matrix(dnt[-1]) ] <- s[ as.matrix(dnt[-1]) ] + t
s
#-----------
A B C D E
A 1 16 11 16 21
B 6 7 12 17 22
C 12 21 13 18 23
D 22 24 26 19 24
E 23 25 27 29 25
s[as.matrix(dnt[c(3,2)])] <- s[as.matrix(dnt[c(3,2)])] + t
s
#----------
A B C D E
A 1 16 13 28 25
B 26 7 30 31 28
C 12 21 13 34 31
D 22 24 26 19 34
E 23 25 27 29 25

t1 <- as.list(t)
p <- for(i in 1 :length(t1))
{
nam <-unlist(strsplit(names(t1[i]),"_"))
s[nam[1],nam[2]]<- t[[i]]
s[nam[2],nam[1]]<- t[[i]]
}
> s
A B C D E
A 1 10 1 6 2
B 10 7 9 7 3
C 1 9 13 8 4
D 6 7 8 19 5
E 2 3 4 5 25

Related

Creating an index for each subject in R

I'm working with some data on repeated measures of subjects over time. The data is in this format:
Subject <- as.factor(c(rep("A", 20), rep("B", 35), rep("C", 13)))
variable.A <- rnorm(mean = 300, sd = 50, n = Subject)
dat <- data.frame(Subject, variable.A)
dat
Subject variable.A
1 A 334.6567
2 A 353.0988
3 A 244.0863
4 A 284.8918
5 A 302.6442
6 A 298.3162
7 A 271.4864
8 A 268.6848
9 A 262.3761
10 A 341.4224
11 A 190.4823
12 A 297.1981
13 A 319.8346
14 A 343.9855
15 A 332.5318
16 A 221.9502
17 A 412.9172
18 A 283.4206
19 A 310.9847
20 A 276.5423
21 B 181.5418
22 B 340.5812
23 B 348.5162
24 B 364.6962
25 B 312.2508
26 B 278.9855
27 B 242.8810
28 B 272.9585
29 B 239.2776
30 B 254.9140
31 B 253.8940
32 B 330.1918
33 B 300.7302
34 B 237.6511
35 B 314.4919
36 B 239.6195
37 B 282.7955
38 B 260.0943
39 B 396.5310
40 B 325.5422
41 B 374.8063
42 B 363.1897
43 B 258.0310
44 B 358.8605
45 B 251.8775
46 B 299.6995
47 B 303.4766
48 B 359.8955
49 B 299.7089
50 B 289.3128
51 B 401.7680
52 B 276.8078
53 B 441.4852
54 B 232.6222
55 B 305.1977
56 C 298.4580
57 C 210.5164
58 C 272.0228
59 C 282.0540
60 C 207.8797
61 C 263.3859
62 C 324.4417
63 C 273.5904
64 C 348.4389
65 C 174.2979
66 C 363.4353
67 C 260.8548
68 C 306.1833
I've used the seq_along() function and the dplyr package to create an index of each observation for every subject:
dat <- as.data.frame(dat %>%
group_by(Subject) %>%
mutate(index = seq_along(Subject)))
Subject variable.A index
1 A 334.6567 1
2 A 353.0988 2
3 A 244.0863 3
4 A 284.8918 4
5 A 302.6442 5
6 A 298.3162 6
7 A 271.4864 7
8 A 268.6848 8
9 A 262.3761 9
10 A 341.4224 10
11 A 190.4823 11
12 A 297.1981 12
13 A 319.8346 13
14 A 343.9855 14
15 A 332.5318 15
16 A 221.9502 16
17 A 412.9172 17
18 A 283.4206 18
19 A 310.9847 19
20 A 276.5423 20
21 B 181.5418 1
22 B 340.5812 2
23 B 348.5162 3
24 B 364.6962 4
25 B 312.2508 5
26 B 278.9855 6
27 B 242.8810 7
28 B 272.9585 8
29 B 239.2776 9
30 B 254.9140 10
31 B 253.8940 11
32 B 330.1918 12
33 B 300.7302 13
34 B 237.6511 14
35 B 314.4919 15
36 B 239.6195 16
37 B 282.7955 17
38 B 260.0943 18
39 B 396.5310 19
40 B 325.5422 20
41 B 374.8063 21
42 B 363.1897 22
43 B 258.0310 23
44 B 358.8605 24
45 B 251.8775 25
46 B 299.6995 26
47 B 303.4766 27
48 B 359.8955 28
49 B 299.7089 29
50 B 289.3128 30
51 B 401.7680 31
52 B 276.8078 32
53 B 441.4852 33
54 B 232.6222 34
55 B 305.1977 35
56 C 298.4580 1
57 C 210.5164 2
58 C 272.0228 3
59 C 282.0540 4
60 C 207.8797 5
61 C 263.3859 6
62 C 324.4417 7
63 C 273.5904 8
64 C 348.4389 9
65 C 174.2979 10
66 C 363.4353 11
67 C 260.8548 12
68 C 306.1833 13
What I'm now looking to do is set up an analysis that looks at every 10 observations, so I'd like to create another column that basically gives me a number for every 10 observations. For example, Subject A would have a sequence of ten "1's" followed by a sequence of ten "2's" (IE, two groupings of 10). I've tried to use the rep() function but the issue I'm running into is that the other subjects don't have a number of observations that is divisible by 10.
Is there a way for the rep() function to just assign the grouping the next number, even if it doesn't have 10 total observations? For example, Subject B would have ten "1's", ten "2's" and then five "3's" (representing that his last group of observations)?
You can use modular division %/% to generate the ids:
dat %>%
group_by(Subject) %>%
mutate(chunk_id = (seq_along(Subject) - 1) %/% 10 + 1) -> dat1
table(dat1$Subject, dat1$chunk_id)
# 1 2 3 4
# A 10 10 0 0
# B 10 10 10 5
# C 10 3 0 0
For a plain vanilla base R solution, you also could try this:
dat$newcol <- 1
dat$index <- ave(dat$newcol, dat$Subject, FUN = cumsum)
dat$chunk_id <- (dat$index - 1) %/% 10 + 1
which, when you run the table command as above gives you
table(dat$Subject, dat$chunk_id)
1 2 3 4
A 10 10 0 0
B 10 10 10 5
C 10 3 0 0
If you don't want the extra 'newcol' column, just use 'NULL' to get rid of it:
dat$newcol <- NULL

reordering selected column of data.table in r

I have a data table dt[] which contains 500 columns, I need to pick 6 columns say (a,c,k,m,n,o) from the data table and put them in the starting of the data table.
Is there any way of doing this ?
We can create a vector of columns of interest ('nm1'), then concatenate that with the column names that are found in 'nm1' (using setdiff. In data.table, for subsetting columns, we use with = FALSE.
nm1 <- c('a', 'c' 'k', 'm', 'n', 'o')
dt[, c(nm1, setdiff(names(dt1), nm1)), with=FALSE]
Other option include setcolorder, but the above method is more convenient as it will not replace the order in the original dataset.
NOTE: No external packages used.
Whether dealing with data.frames or data.tables, I would suggest loading "data.table" and using setcolorder.
Paired up with moveMe from my "SOfun" package, you have a very flexible means of reordering columns.
Loading package and creating sample data:
library(SOfun)
library(data.table)
DT <- as.data.table(as.list(setNames(1:26, letters)))
DF <- setDF(copy(DT))
DT
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
DF
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Moving columns:
setcolorder(DT, moveMe(names(DT), "a,c,k,m,n,o first"))
DT
# a c k m n o b d e f g h i j l p q r s t u v w x y z
# 1: 1 3 11 13 14 15 2 4 5 6 7 8 9 10 12 16 17 18 19 20 21 22 23 24 25 26
setcolorder(DF, moveMe(names(DF), "a,c,k,m,n,o first"))
DF
# a c k m n o b d e f g h i j l p q r s t u v w x y z
# 1 1 3 11 13 14 15 2 4 5 6 7 8 9 10 12 16 17 18 19 20 21 22 23 24 25 26
Beyond "first", you also have "last", "before", and "after".
setcolorder(DF, moveMe(names(DF), "a,c,k,m,n,o first; l,e,q,r,w last"))
DF
# a c k m n o b d f g h i j p s t u v x y z l e q r w
# 1 1 3 11 13 14 15 2 4 6 7 8 9 10 16 19 20 21 22 24 25 26 12 5 17 18 23

Combining a list of named vectors without mangling the names

How do I combine a list of named vectors? I need to split a vector of integers (with characters for names) for use with parallel::parSapply() and combine them back again. Example code:
text <- 1:26
names(text) <- letters
n <- 4
text <- split(text, cut(1:length(text),breaks=n,labels=1:n))
# text <- parSapply(..., text, ...) would go here in the actual code
However, the names get mangled when I use unlist to convert the data back into a named vector:
> unlist(text)
1.a 1.b 1.c 1.d 1.e 1.f 1.g 2.h 2.i 2.j 2.k 2.l 2.m 3.n 3.o 3.p 3.q 3.r 3.s 4.t 4.u 4.v
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
4.w 4.x 4.y 4.z
23 24 25 26
What I'm looking for is the following result (except that it should work with any value of n):
> c(text[[1]],text[[2]],text[[3]],text[[4]])
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
One option without changing the structure of 'text' would be to change the names of the vector (unlist(text)) with the names of onjects within the list elements.
setNames(unlist(text), unlist(sapply(text, names)))
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Or if it is okay to remove the names of the 'text' object, set the names of 'text' to NULL and then unlist
unlist(setNames(text, NULL))
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
You can remove the list elements names first then there won't be compound naming happening.
> names(text) <- NULL
> do.call(c, text)
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Same as
> unlist(text)
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Or as #RichardScriven pointed out in the comment, you can do it as follows without removing the name in the source variable: do.call("c", c(text, use.names = FALSE))

Add data frames row wise with [d]plyr

I have two data frames
df1
# a b
# 1 10 20
# 2 11 21
# 3 12 22
# 4 13 23
# 5 14 24
# 6 15 25
df2
# a b
# 1 4 8
I want the following output:
df3
# a b
# 1 14 28
# 2 15 29
# 3 16 30
# 4 17 31
# 5 18 32
# 6 19 33
i.e. add df2 to each row of df1.
Is there a way to get the desired output using plyr (mdplyr??) or dplyr?
I see no reason for "dplyr" for something like this. In base R you could just do:
df1 + unclass(df2)
# a b
# 1 14 28
# 2 15 29
# 3 16 30
# 4 17 31
# 5 18 32
# 6 19 33
Which is the same as df1 + list(4, 8).
One liner with dplyr.
mutate_each(df1, funs(.+ df2$.), a:b)
# a b
#1 14 28
#2 15 29
#3 16 30
#4 17 31
#5 18 32
#6 19 33
A base R solution using sweet function sweep:
sweep(df1, 2, unlist(df2), '+')
# a b
#1 14 28
#2 15 29
#3 16 30
#4 17 31
#5 18 32
#6 19 33

All possible unique pair combinations of gamete positions

I have some gamete data in the following format:
Ind Letter Place Position
1 A 19 23
2 B 19 23
3 B 19 23
4 B 19 23
1 B 19 34
2 A 19 34
3 B 19 34
4 B 19 34
1 C 19 52
2 T 19 52
3 C 19 52
4 T 19 52
1 T 33 15
2 T 33 15
3 T 33 15
4 C 33 15
1 C 33 26
2 T 33 26
3 T 33 26
4 C 33 26
dput of data:
structure(list(Ind = c(1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L),
Letter = structure(c(1L,2L,2L,2L,2L,1L,2L,2L,3L,4L,3L,4L,4L,4L,4L,3L,3L,4L,4L,3L),
.Label = c("A","B","C","T"), class="factor"),
Place = c(19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,33L,33L,33L,33L,33L,33L,33L,33L),
Position = c(23L,23L,23L,23L,34L,34L,34L,34L,52L,52L,52L,52L,15L,15L,15L,15L,26L,26L,26L,26L)),
.Names = c("Ind","Letter","Place","Position"),
class="data.frame", row.names = c(NA,-20L))
I need to pair and combine them, so I get all possible unique combinations with reference to Position within a pair. I have another data-file, that contains information on the pairs, and they are paired with reference to Place. So in this file I may see, that Place 19+Place 33 is a pair, and I want the following result:
Ind Letter Place Position Ind Letter Place Position
1 A 19 23 1 T 33 15
2 B 19 23 2 T 33 15
3 B 19 23 3 T 33 15
4 B 19 23 4 C 33 15
1 A 19 23 1 C 33 26
2 B 19 23 2 T 33 26
3 B 19 23 3 T 33 26
4 B 19 23 4 C 33 26
1 B 19 34 1 T 33 15
2 A 19 34 2 T 33 15
3 B 19 34 3 T 33 15
4 B 19 34 4 C 33 15
1 B 19 34 1 C 33 26
2 A 19 34 2 T 33 26
3 B 19 34 3 T 33 26
4 B 19 34 4 C 33 26
1 C 19 52 1 T 33 15
2 T 19 52 2 T 33 15
3 C 19 52 3 T 33 15
4 T 19 52 4 C 33 15
1 C 19 52 1 C 33 26
2 T 19 52 2 T 33 26
3 C 19 52 3 T 33 26
4 T 19 52 4 C 33 26
In this case unique means that A1:A2 is equal to A2:A1.
The reason I want to do this, is because I want to do a Four-Gamete-Test on the pairs, to the see if all possible combinations of Letter is existent. So e.g. for the last combined pair above, we have the letter-pairs CC, TT, CT, TC, so this combined pair will pass the FGT.
I have tried to do the combining with expand.grid, as it seems this is quite close to what I want. However, when I require all combination of data$Position, I lose the information for Ind, Letter, and Place. Also the output includes non-unique pairs.
Can anyone point me to a tool, that is closer to what I want? Or give me some guidelines on how to modify expand.grid, to get what I need.
Should you be aware of a tool, that actually does the Four-Gamete-Test, or something similar, then that would of course also be interesting for me to look at.
You can use expand.grid but not directly on the Position column. The idea is to find all combinations of the "quartets" (unique Positions):
pair <- c(19, 33)
df1 <- df1[df1$Place %in% pair, ]
split1 <- split( df1, df1$Position)
vec1 <- unique(df1$Position[df1$Place == pair[1]])
vec2 <- unique(df1$Position[df1$Place == pair[2]])
combin_num <- expand.grid(vec2, vec1)[,2:1]
do.call(
rbind,
lapply(seq_len(nrow(combin_num)), function(i){
cbind( split1[[as.character(combin_num[i,1])]],
split1[[as.character(combin_num[i,2])]] )
})
)[,]
Result:
# Ind Letter Place Position Ind.1 Letter.1 Place.1 Position.1
# 1 1 A 19 23 1 T 33 15
# 2 2 B 19 23 2 T 33 15
# 3 3 B 19 23 3 T 33 15
# 4 4 B 19 23 4 C 33 15
# 5 1 A 19 23 1 C 33 26
# 6 2 B 19 23 2 T 33 26
# 7 3 B 19 23 3 T 33 26
# 8 4 B 19 23 4 C 33 26
# 51 1 B 19 34 1 T 33 15
# 61 2 A 19 34 2 T 33 15
# 71 3 B 19 34 3 T 33 15
# 81 4 B 19 34 4 C 33 15
# 52 1 B 19 34 1 C 33 26
# 62 2 A 19 34 2 T 33 26
# 72 3 B 19 34 3 T 33 26
# 82 4 B 19 34 4 C 33 26
# 9 1 C 19 52 1 T 33 15
# 10 2 T 19 52 2 T 33 15
# 11 3 C 19 52 3 T 33 15
# 12 4 T 19 52 4 C 33 15
# 91 1 C 19 52 1 C 33 26
# 101 2 T 19 52 2 T 33 26
# 111 3 C 19 52 3 T 33 26
# 121 4 T 19 52 4 C 33 26

Resources