R fill list into column based on index - r

I have a number of sets of data of differing length that I am trying to make into an ordered data structure.
At the present time I am trying to inset lists into the regular data structure based on filling by index number.
with the following code:
mat <- matrix(NA,nrow=5,ncol=6)
mat[,1] <- LETTERS[1:5]
vec1 <- c("B","D","E")
vec2 <- c("A","C","E")
m1 <- match(mat[,1],vec1)
m2 <- match(mat[,1],vec2)
x1 <- which(!is.na(m1))
x2 <- which(!is.na(m2))
I would like to know how to procede to get:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A" NA "A" NA NA NA
[2,] "B" "B" NA NA NA NA
[3,] "C" NA "C" NA NA NA
[4,] "D" "D" NA NA NA NA
[5,] "E" "E" "E" NA NA NA
Any suggestions or hints please?
Thanks,
Matt

Try
mat[match(vec1, mat[,1]), 2] <- vec1
mat[match(vec2, mat[,1]), 3] <- vec2
mat
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "A" NA "A" NA NA NA
# [2,] "B" "B" NA NA NA NA
# [3,] "C" NA "C" NA NA NA
# [4,] "D" "D" NA NA NA NA
# [5,] "E" "E" "E" NA NA NA
Or
mat[mat[, 1] %in% vec1, 2] <- vec1
mat[mat[, 1] %in% vec2, 3] <- vec2
Or a more general approach
mylist <- list(vec1, vec2)
indx <- unlist(lapply(seq_len(length(mylist) + 1)[-1],
function(x) match(matchlist[[x-1]], mat[, 1])))
indx2 <- rep(seq_len(length(mylist) + 1)[-1], sapply(mylist , length))
mat[cbind(indx, indx2)] <- unlist(mylist)

You could also do if you have more number of columns to match:
mat1 <- mat
indx <- which(outer(mat[,1], c(vec1, vec2), "=="), arr.ind=TRUE)[,1]
indx1 <- rep(1:2, c(length(vec1), length(vec2))) +1
mat[cbind(indx, indx1)] <- c(vec1,vec2)
mat
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] "A" NA "A" NA NA NA
#[2,] "B" "B" NA NA NA NA
#[3,] "C" NA "C" NA NA NA
#[4,] "D" "D" NA NA NA NA
#[5,] "E" "E" "E" NA NA NA
Suppose, you need to fill a couple more columns based on multiple vector matching
vec3 <- c("A", "D")
vec4 <- c("B", "C")
veclist <- mget(ls(pattern="vec\\d"))
unlist(veclist, use.names=FALSE)
#[1] "B" "D" "E" "A" "C" "E" "A" "D" "B" "C"
indx <- which(outer(mat[,1], unlist(veclist), "=="), arr.ind=TRUE)[,1]
indx1 <- rep(seq(length(veclist))+1, sapply(veclist, length))
mat1[cbind(indx, indx1)] <- unlist(veclist)
mat1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] "A" NA "A" "A" NA NA
#[2,] "B" "B" NA NA "B" NA
#[3,] "C" NA "C" NA "C" NA
#[4,] "D" "D" NA "D" NA NA
#[5,] "E" "E" "E" NA NA NA

Related

What's the most economical way to fill a matrix with random values?

I'm trying to find the most economical and elegant code for a simple task: fill an empty matrix with randomly sampled values (here, A, B, or C). For illustration, let's take this matrix:
x <- matrix(NA, nrow=8, ncol=4)
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
To fill it I've used two codes so far, each successfully doing the job. The first uses sapply:
x[] <- sapply(x, function(i) sample(LETTERS[1:3], 1, replace = F))
x
[,1] [,2] [,3] [,4]
[1,] "C" "A" "B" "C"
[2,] "B" "B" "B" "B"
[3,] "A" "B" "B" "B"
[4,] "B" "C" "A" "C"
[5,] "B" "A" "C" "A"
[6,] "A" "B" "C" "A"
[7,] "A" "C" "C" "A"
[8,] "C" "B" "B" "C"
while the second is a forloop:
for(i in 1:nrow(x)){
x[i,] <- sample(LETTERS[1:3], 4, replace = T)
}
x
[,1] [,2] [,3] [,4]
[1,] "C" "A" "C" "C"
[2,] "C" "A" "B" "B"
[3,] "C" "C" "A" "B"
[4,] "C" "C" "A" "C"
[5,] "A" "C" "C" "C"
[6,] "B" "C" "A" "A"
[7,] "C" "C" "B" "A"
[8,] "B" "C" "B" "C"
I like neither of them as they both look bulky. Is there a better way to get the expected result, that is, is there a shorter and/or more elegant way?
How about assigning it directly?
x[] <- sample(LETTERS, length(x), replace = TRUE)
x
# [,1] [,2] [,3] [,4]
#[1,] "A" "H" "V" "A"
#[2,] "X" "M" "Y" "O"
#[3,] "A" "W" "N" "I"
#[4,] "H" "Y" "Y" "C"
#[5,] "W" "N" "O" "P"
#[6,] "Y" "H" "P" "J"
#[7,] "I" "Y" "N" "H"
#[8,] "S" "F" "Z" "I"
If you want only include first three LETTERS this would work
x[] <- sample(LETTERS[1:3], length(x), replace = TRUE)
We can use replace without changing the original matrix
replace(x, TRUE, sample(LETTERS, length(x), replace = TRUE))
# [,1] [,2] [,3] [,4]
#[1,] "B" "O" "S" "D"
#[2,] "N" "C" "Q" "Z"
#[3,] "X" "X" "Z" "X"
#[4,] "O" "G" "R" "R"
#[5,] "L" "B" "S" "U"
#[6,] "Y" "I" "O" "A"
#[7,] "L" "Y" "P" "M"
#[8,] "R" "X" "H" "T"

R - filtering Matrix based off True/False vector

I have a data structure that can contain both vectors and matrices. I want to filter it based off of of a true false column. I can't figure out how to filter both of them successfully.
result <- structure(list(aba = c(1, 2, 3, 4), beta = c("a", "b", "c", "d"),
chi = structure(c(0.438148361863568, 0.889733991585672, 0.0910745360888541,
0.0512442977633327, 0.812013201415539, 0.717306115897372, 0.995319503592327,
0.758843480376527, 0.366544214077294, 0.706843026448041, 0.108310810523108,
0.225777650484815, 0.831163870869204, 0.274351604515687, 0.323493955424055,
0.351171918679029), .Dim = c(4L, 4L))), .Names = c("aba", "beta", "chi"))
> result
$aba
[1] 1 2 3 4
$beta
[1] "a" "b" "c" "d"
$chi
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] 0.88973399 0.7173061 0.7068430 0.2743516
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
tf <- c(T,F,T,T)
What I would like to do is something like
> lapply(result,function(x) {ifelse(tf,x,NA)})
$aba
[1] 1 NA 3 4
$beta
[1] "a" NA "c" "d"
$chi
[1] 0.43814836 NA 0.09107454 0.05124430
but the $chi matrix structure is lost.
The result I'd expect is
ifelse(matrix(tf,ncol=4,nrow=4),result$chi,NA)
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] NA NA NA NA
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
The challenge I'm having a problem solving is how to match the tf vector to the data. It feels like I need to set it using a conditional based on data type, which I'd like to avoid. Thoughts and answers are appreciated.
I don't see how you can avoid either checking the data type or the "dimensions" of the data. As such, I would propose something like:
lapply(result, function(x) {
if (is.null(dim(x))) x[!tf] <- NA else x[!tf, ] <- NA
x
})
# $aba
# [1] 1 NA 3 4
#
# $beta
# [1] "a" NA "c" "d"
#
# $chi
# [,1] [,2] [,3] [,4]
# [1,] 0.43814836 0.8120132 0.3665442 0.8311639
# [2,] NA NA NA NA
# [3,] 0.09107454 0.9953195 0.1083108 0.3234940
# [4,] 0.05124430 0.7588435 0.2257777 0.3511719
This seems fairly simple:
is.na(tf) <- !tf # convert FALSE to NA
result$chi[ tf, ] # and use the default behavior of "[" with NA arg
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] NA NA NA NA
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
But now I see that you wanted NAs at the corresponging postions of the atomic vectors. Unfortunately "[" with the additional NULL argument would error-out on that type of object.

R: transposing and splitting a row with a delimiter.

I have a table
rawData <- as.data.frame(matrix(c(1,2,3,4,5,6,"a,b,c","d,e","f"),nrow=3,ncol=3))
1 4 a,b,c
2 5 d,e
3 6 f
I would like to convert to
1 2 3
4 5 6
a d f
b e
c
so far I can transpose and split the third column, however, I'm lost as to how to reconstruct a new table with the format outline above?
new = t(rawData)
for (e in 1:ncol(new)){
s<-strsplit(new[3:3,e], split=",")
print(s)
}
I tried creating new vectors for each iteration but I'm not sure how to efficiently put each one back into a dataframe. Would be grateful for any help. thanks!
You can use stri_list2matrix from the stringi package:
library(stringi)
rawData <- as.data.frame(matrix(c(1,2,3,4,5,6,"a,b,c","d,e","f"),nrow=3,ncol=3),stringsAsFactors = F)
d1 <- t(rawData[,1:2])
rownames(d1) <- NULL
d2 <- stri_list2matrix(strsplit(rawData$V3,split=','))
rbind(d1,d2)
# [,1] [,2] [,3]
# [1,] "1" "2" "3"
# [2,] "4" "5" "6"
# [3,] "a" "d" "f"
# [4,] "b" "e" NA
# [5,] "c" NA NA
You can also use cSplit from my "splitstackshape" package.
By default, it just creates additional columns after splitting the input:
library(splitstackshape)
cSplit(rawData, "V3")
# V1 V2 V3_1 V3_2 V3_3
# 1: 1 4 a b c
# 2: 2 5 d e NA
# 3: 3 6 f NA NA
You can just transpose that to get your desired output.
t(cSplit(rawData, "V3"))
# [,1] [,2] [,3]
# V1 "1" "2" "3"
# V2 "4" "5" "6"
# V3_1 "a" "d" "f"
# V3_2 "b" "e" NA
# V3_3 "c" NA NA

Replace NA with empty string in a list

I have large list of matrix data that looks like this:
$`1`
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
2010 "6 811 529 000" NA NA NA "455 782 000"
2011 "7 531 264 000" NA NA NA "585 609 000"
2012 "8 013 843 000" NA NA NA "702 256 000"
and I would like to replace the NA with empty string like this : ""
The solution must be without conversion to data.frame since this: x[is.na(x)] <- ""
would solve the issue.
This works for me: print(x, na.print = "") but I cannot figure it out how to store the print output.
You can do this with lapply:
# Setup sample data frame
dat = list(matrix(c(NA, "a", "b", NA), nrow=2),
matrix(c(rep("r", 8), NA), nrow=3))
dat
# [[1]]
# [,1] [,2]
# [1,] NA "b"
# [2,] "a" NA
#
# [[2]]
# [,1] [,2] [,3]
# [1,] "r" "r" "r"
# [2,] "r" "r" "r"
# [3,] "r" "r" NA
# Do conversion
dat <- lapply(dat, function(x) { x[is.na(x)] <- "" ; x })
dat
# [[1]]
# [,1] [,2]
# [1,] "" "b"
# [2,] "a" ""
#
# [[2]]
# [,1] [,2] [,3]
# [1,] "r" "r" "r"
# [2,] "r" "r" "r"
# [3,] "r" "r" ""

Combination without repetition in R

I am trying to get all the possible combinations of length 3 of the elements of a variable. Although it partly worked with combn() I did not quite get the output I was looking for. Here's my example
x <- c("a","b","c","d","e")
t(combn(c(x,x), 3))
The output I get looks like this
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "b" "d"
[3,] "a" "b" "e"
I am not really happy with this command for 2 reasons. I wanted to get an output that says "a+b+c" "a+b+b"...., unfortunately I wasn't able to edit the output with paste() or something.
I was also looking forward for one combination of each set of letters, that is I either get "a+b+c" or "b+a+c" but not both.
Try something like:
x <- c("a","b","c","d","e")
d1 <- combn(x,3) # All combinations
d1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "a" "a" "a" "a" "a" "a" "b" "b" "b" "c"
# [2,] "b" "b" "b" "c" "c" "d" "c" "c" "d" "d"
# [3,] "c" "d" "e" "d" "e" "e" "d" "e" "e" "e"
nrow(unique(t(d1))) == nrow(t(d1))
# [1] TRUE
d2 <- expand.grid(x,x,x) # All permutations
d2
# Var1 Var2 Var3
# 1 a a a
# 2 b a a
# 3 c a a
# 4 d a a
# 5 e a a
# 6 a b a
# 7 b b a
# 8 c b a
# 9 d b a
# ...
nrow(unique(d2)) == nrow(d2)
# [1] TRUE
try this
x <- c("a","b","c","d","e")
expand.grid(rep(list(x), 3))

Resources