Recursively set dimnames on a list of matrices - r

On a list of matrices, I'd like to set only the colnames and leave the rownames as NULL. The matrices are all different dimension. Unlike this example, the names are specific to each matrix.
provideDimnames gets me in the ballpark, but I'm having trouble telling it to ignore the NULL row names, and only set the column names. Here are my attempts.
> L <- list(matrix(1:6, 2), matrix(1:20, 5))
> dimnm <- list(list(NULL, letters[1:3]), list(NULL, letters[1:4]))
> lapply(L, provideDimnames, base = dimnm)
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(dimnm))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(letters))
# [[1]]
# a b c
# a 1 3 5
# b 2 4 6
#
# [[2]]
# a b c d
# a 1 6 11 16
# b 2 7 12 17
# c 3 8 13 18
# d 4 9 14 19
# e 5 10 15 20
Almost, but I want [n,] for the row names. The desired result is:
> dimnames(L[[1]]) <- list(NULL, letters[1:3])
> dimnames(L[[2]]) <- list(NULL, letters[1:4])
> L
# [[1]]
# a b c
# [1,] 1 3 5
# [2,] 2 4 6
#
# [[2]]
# a b c d
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
> lapply(L, provideDimnames, base = list(NULL, letters))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, `colnames<-`, , letters)
# Error in FUN(X[[1L]], ...) :
# unused argument (c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
# "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))
Is there a way to do this with provideDimnames()? setNames() wouldn't accept a list for the dim-names either.

How about something like this?
L <- list(matrix(1:6, 2), matrix(1:20, 5))
nms <- list(letters[1:3], letters[23:26])
mapply(function(X,Y) {colnames(X) <-Y; X}, L, nms)
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
w x y z
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

You can do this relatively easily but you are complicating it by trying to do both dimnames where really you just want to fiddle with the column names. I would go about it this way:
## different dimnames; list of only the colnames
dimnm <- list(letters[1:3], letters[1:4])
## function to lapply which does the change
cnames <- function(i, lmat, names) {
colnames(lmat[[i]]) <- names[[i]]
lmat[[i]]
}
## do the change
L2 <- lapply(seq_along(L), cnames, lmat = L, names = dimnm)
L2
Gives us:
> L2
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
a b c d
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

Related

How to calculate the mean for every n vectors from a df

How to calculate the mean for every n vectors from a df creating a new data frame with the results.
I expect to get:
column 1: mean (V1,V2),
column 2: mean (V3,V4),
column 3: mean (V5,V6)
,and so forth
data
df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)
Here is base R option
n <- 2 # Mean across every n = 2 columns
do.call(cbind, lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)])))
# [,1] [,2] [,3]
#[1,] 4 16 28
#[2,] 5 17 29
#[3,] 6 18 30
#[4,] 7 19 31
#[5,] 8 20 32
#[6,] 9 21 33
This returns a matrix rather than a data.frame (which makes more sense here since you're dealing with "all-numeric" data).
Explanation: The idea is a non-overlapping sliding window approach. seq(1, ncol(df), by = n) creates the start indices of the columns (here: 1, 3, 5). We then loop over those indices idx and calculate the row means of df[c(idx, idx + 1)]. This returns a list which we then cbind into a matrix.
As a minor modifcation, you can also predefine a data.frame with the right dimensions and then skip the do.call(cbind, ...) step by having R do an implicit list to data.frame typecast.
out <- data.frame(matrix(NA, ncol = ncol(df) / 2, nrow = nrow(df)))
out[] <- lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)]))
# X1 X2 X3
#1 4 16 28
#2 5 17 29
#3 6 18 30
#4 7 19 31
#5 8 20 32
#6 9 21 33
You may try,
dummy <- data.frame(
v1 = c(1:10),
v2 = c(1:10),
v3 = c(1:10),
v4 = c(1:10),
v5 = c(1:10),
v6 = c(1:10)
)
nvec_mean <- function(df, n){
res <- c()
m <- matrix(1:ncol(df), ncol = n, byrow = T)
if (ncol(df) %% n != 0){
stop()
}
for (i in 1:nrow(m)){
v <- rowMeans(df[,m[i,]])
res <- cbind(res, v)
}
colnames(res) <- c(1:nrow(m))
res
}
nvec_mean(dummy,3)
1 2
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 9 9
[10,] 10 10
If you didn't want rowMeans or result is not what you wanted, please let me know.
Simple(?) version
df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)
n = 2
res <- c()
m <- matrix(1:ncol(df), ncol = 2, byrow = T)
for (i in 1:nrow(m)){
v <- rowMeans(df[,m[i,]])
res <- cbind(res, v)
}
res
v v v
[1,] 4 16 28
[2,] 5 17 29
[3,] 6 18 30
[4,] 7 19 31
[5,] 8 20 32
[6,] 9 21 33

Match vectors in sequence

I have 2 vectors.
x=c("a", "b", "c", "d", "a", "b", "c")
y=structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Names = c("a", "e", "b",
"c", "d", "a", "b", "c"))
I would like to match a to a, b to b in sequence accordingly, so that x[2] matches y[3] rather than y[7]; and x[5] matches y[6] rather than y[1], so on and so forth.
lapply(x, function(z) grep(z, names(y), fixed=T))
gives:
[[1]]
[1] 1 6
[[2]]
[1] 3 7
[[3]]
[1] 4 8
[[4]]
[1] 5
[[5]]
[1] 1 6
[[6]]
[1] 3 7
[[7]]
[1] 4 8
which matches all instances. How do I get this sequence:
1 3 4 5 6 7 8
So that elements in x can be mapped to the corresponding values in y accordingly?
You are actually looking for pmatch
pmatch(x,names(y))
[1] 1 3 4 5 6 7 8
You can change the names attributes according to the number of times each element appeared and then subset y:
x2 <- paste0(x, ave(x, x, FUN=seq_along))
#[1] "a1" "b1" "c1" "d1" "a2" "b2" "c2"
names(y) <- paste0(names(y), ave(names(y), names(y), FUN=seq_along))
y[x2]
#a1 b1 c1 d1 a2 b2 c2
# 1 3 4 5 6 7 8
Another option using Reduce
Reduce(function(v, k) y[-seq_len(v)][k],
x=x[-1L],
init=y[x[1L]],
accumulate=TRUE)
Well, I did it with a for-loop
#Initialise the vector with length same as x.
answer <- numeric(length(x))
for (i in seq_along(x)) {
#match the ith element of x with that of names in y.
answer[i] <- match(x[i], names(y))
#Replace the name of the matched element to empty string so next time you
#encounter it you get the next index.
names(y)[i] <- ""
}
answer
#[1] 1 3 4 5 6 7 8
Another possibility:
l <- lapply(x, grep, x = names(y), fixed = TRUE)
i <- as.integer(ave(x, x, FUN = seq_along))
mapply(`[`, l, i)
which gives:
[1] 1 3 4 5 6 7 8
Similar solution to Ronak, but it does not persist changes to y
yFoo<-names(y)
sapply(x,function(u){res<-match(u,yFoo);yFoo[res]<<-"foo";return(res)})
Result
#a b c d a b c
#1 3 4 5 6 7 8

Create a specific number of vertors for a list

I need to create a list of "N" vectors with a length "L" that begin in number "B" . If I specify that N=3, L=4 and B=5. I would need a list of the following three vectors.
5 ,6,7,8,
9,10,11,12
13,14,15,16
I can do it manually one by one but I have sometimes 20 or 30 vectors to create with always different lengths.
I would appreciate if someone could give me a hand with this.
Cheers
Carlos
If you are happy with matrix as an output...
N <- 3
L <- 4
B <- 5
x <- seq(from = B, to = B + N * L - 1)
y <- matrix(x, nrow = N, byrow = TRUE)
y
# [,1] [,2] [,3] [,4]
# [1,] 5 6 7 8
# [2,] 9 10 11 12
# [3,] 13 14 15 16
Taking the matrix to list via transposition and data.frame...
as.list(as.data.frame(t(y)))
# $V1
# [1] 5 6 7 8
#
# $V2
# [1] 9 10 11 12
#
# $V3
# [1] 13 14 15 16
I'm showing it in this way partly because I've never liked the coercion of numbers to colnames, certainly other ways to handle that. The transposition may be removed if you set y <- matrix(x, nrow = L) instead. And drop the as.list because technically the data.frame is a list.
as.data.frame(y)
# V1 V2 V3
# 1 5 9 13
# 2 6 10 14
# 3 7 11 15
# 4 8 12 16
You can use split() to get a list output.
split(seq(B, B + L*N - 1), (1:(L*N)-1) %/% N)

How to understand a specific function in R

Here I have a simple function in R below it is:
no.dimnames <- function(a) {
## Remove all dimension names from an array for compact printing.
d <- list()
l <- 0
for(i in dim(a)) {
d[[l <- l + 1]] <- rep("", i)
}
dimnames(a) <- d
a
}
The goal for this function is to drop all array names. However, I dont know what does the following indexing do.
d[[l <- l + 1]]
In this case, d is a null list initially, and l<- 0 so then d[[0<- 1]] implies what?
> x <- matrix(sample(1:5,20,replace=TRUE),nrow = 5)
> x
[,1] [,2] [,3] [,4]
[1,] 5 4 5 3
[2,] 2 1 5 1
[3,] 1 3 4 4
[4,] 3 1 4 3
[5,] 5 3 5 5
> no.dimnames(x)
5 4 5 3
2 1 5 1
1 3 4 4
3 1 4 3
5 3 5 5
It looks like you understand the increment code d[[l <- l + 1]] but are still asking about the empty spaces rep("", i). They are replacing the dimension names with blanks. The i is used to indicate the amount of spaces that are needed.
If we had a 4x5 matrix. We would have four row names and five column names. To make them all blank, we would need four spaces in rows rep("", 4) and five in columns rep("", 5). The code aims to accomplish that:
mat <- matrix(1:20, 4,5)
rownames(mat) <- month.abb[1:4]
colnames(mat) <- letters[1:5]
mat
# a b c d e
# Jan 1 5 9 13 17
# Feb 2 6 10 14 18
# Mar 3 7 11 15 19
# Apr 4 8 12 16 20
dimnames(mat)
# [[1]]
# [1] "Jan" "Feb" "Mar" "Apr"
#
# [[2]]
# [1] "a" "b" "c" "d" "e"
#What we need
list(rep("", 4), rep("", 5))
# [[1]]
# [1] "" "" "" ""
#
# [[2]]
# [1] "" "" "" "" ""
dimnames(mat) <- list(rep("", 4), rep("", 5))
mat
#
# 1 5 9 13 17
# 2 6 10 14 18
# 3 7 11 15 19
# 4 8 12 16 20
d[[0<- 1]] isn't valid... you are saying set 0 to 1 which can't be done. In this case l is being set to l + 1 where it is initially 0 so it is l <- 0 + 1
Forget about what a is or what it could be.
Just type this into Rstudio or w/e you are using you will see what happens if you check each variable.
> d <- list()
> l <- 0
> d[[l <- l + 1]] <- rep("", 1)
> d
> l
The one part I should explain is in this case when you type the d[[l <- l + 1]] it is assigning l + 1 to l and then using l as the parameter for the [[]].
So the d[[l <- l + 1]] breaks down to this...
l <- l + 1
l <- 0 + 1
l <- 1
[l]
[[l]]
d[[l]]
d[[1]]

Replacing Values in R - Error Received

So I have a data frame (called gen) filled with nucleotide information: each value is either A, C, G, or T. I am looking to replace A with 1, C with 2, G with 3, and T with 4. When I use the function gen[gen==A] = 1, I get the error:
Error in [<-.data.frame(*tmp*, gen == A, value = 1) :
object 'A' not found
I even tried using gen <- replace(gen, gen == A, 1), but it gives me the same error. Does anyone know how to fix this error? If not, is there a package that I can install in R with a program that will convert A, C, G, and T to numeric values?
Thanks
You need to wrap A in quotes or else R looks for a variable named A.
If the columns are character vectors:
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE), stringsAsFactors = FALSE)
R> gen[gen == "A"] = 1
R> gen
x y
1 1 1
2 C C
3 G T
4 T T
5 G G
6 G G
7 1 1
8 C C
9 T 1
10 1 1
also 1 way to do all at once
R> library(car)
R> sapply(gen, recode, recodes = "'A'=1; 'C'=2; 'G'=3; 'T'=4")
x y
[1,] 1 1
[2,] 2 2
[3,] 3 4
[4,] 4 4
[5,] 3 3
[6,] 3 3
[7,] 1 1
[8,] 2 2
[9,] 4 1
[10,] 1 1
If the columns are factors
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE))
R> sapply(gen, as.numeric)
x y
[1,] 1 1
[2,] 2 4
[3,] 1 2
[4,] 4 1
[5,] 2 2
[6,] 1 4
[7,] 4 3
[8,] 3 3
[9,] 2 4
[10,] 4 2

Resources