creating a matrix/dataframe with two for loops in R - r

This is my first post on SO, so be kind!
My question is vaguely related to this one:
Double for loop in R creating a matrix
I want to create a matrix/dataframe and the approach my mind has chosen is to nest two for loops, one to create the first row and the second to repeat it for the rows I nedd.
I could successfully create the first loop, but I can't seem to iterate it for the number of rows I need.
I'm sure that there is a better way to do this, anyway, this is the for loop that gives the result I need for the first row:
x <- character(0)
for(j in 1:18){
x <- c(x, sum(it_mat[1, 2:26] == j))
}
it_mat is a matrix of 417 rows and 26 columns, where the first column is a string vector with various names and the subsequent columns are randomly generated numbers from 1 to 18.
Here's the first row:
[1,] "Charlie" "14" "3" "9" "14" "3" "9" "11" "11" "18" "17" "16" "5" "18" "6" "10" "3" "9" "9" "3" "18" "12" "8" "5" "5" "4"
I want to create a matrix/df where I count how many times, for each name, each number appearead.
The for loop I created above gives me the result I want for the first row:
x
[1] "0" "0" "4" "1" "3" "1" "0" "1" "4" "1" "2" "1" "0" "2" "0" "1" "1" "3"
I really can't iterate it for the subsequent rows with another for loop, there must be something very mundane that I do wrong.
This is my best attempt:
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
I went on it all afternoon and now I give up and reach out to you, before you give me the correct way to do it, please explain what I'm doing wrong in the nested for loops try, I might learn something.
I hope I explained myself, sorry if I've been too verbose.
Thanks for your time.

#RuiBarradesh has pin-pointed the actual problem in OP last attempt. There is another way to fix the OP code using rbind.
# Do not create rows at this place. Let the rows be added with rbind
tr_mat <- matrix(nrow = 0, ncol = 18) #(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
tr_mat # This will display correct result too

Another way, using base R. Note that *apply functions are loops in disguise.
tr_mat2 <- sapply(1:18, function(j) sapply(1:147, function(i) sum(it_mat[i, 2:26] == j)))
Note that this code will produce a matrix of numbers, while your tr_mat is of mode character:
all.equal(tr_mat, tr_mat2)
#[1] "Modes: character, numeric"
DATA.
This is the dataset generation code that I have used to test the code above.
set.seed(7966) # make the results reproducible
it_mat <- t(replicate(147, c(sample(letters, 1), sample(18, 25, TRUE))))
EDIT.
Following the suggestion in the comments by MKR, here is the OP's code corrected with my modification in the comment to his (the OP's) post.
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat[i, ] <- x
}
This is the code that I have used to produce the matrix tr_mat refered to in the all.equal test above.

Do you realy need 2 loops? Here is a solution without any loop using data.table and combination of melt/dcast functions:
library(data.table)
# dataset ----------------------------
seed(2018)
it_mat<-data.frame(c1=c('Charlie','John','Peter'))
for(i in 2:26){
it_mat[,paste0('c',i)]<-sample(1:18,3)
}
# calculation ----------------------------
it_mat<-data.table(it_mat)
it_mat<-melt(it_mat,id.vars='c1')
it_mat[,.N,by=.(c1,value)][order(c1,value)]
dcast(it_mat[,.N,by=.(c1,value)][order(c1,value)],c1~value)

Related

Create all possible numeric combinations given n values

I want to create a function such that, given a vector of n numbers, create all possible combinations from them (the same expand.grid() does, really).
For example, given the vector [1, 2] I want to create a function such that it outputs all possible combinations, i.e., "11", "12", "21", "21".
This is what I've come up with:
V <- trunc(runif(3, 0, 9))
FE2 <- function(V){
C <- c()
n <- 0
for(i in 1:length(V)){
for (j in 1:length(V)){
for (k in 1:length(V)){
C <- c(paste0(V[i], V[j], V[k]))
n <- n + 1
print(c(paste0("Number of combination: ", n), paste0("Combination: ", C)))
}
}
}
}
FE2(V)
The function does work. The problem is that for each element of the original vector, I have to add another for(). That is, if I wanted to compute all possible combinations for a vector of 10 elements, say [1, 2, ..., 10] I would have to create 10 for()-loops in my function. I wonder if there's another, more efficient way to do it, as long as it does not significantly differs from my actual solution.
With expand.grid you then need to "collapse" the rows:
apply( expand.grid( 1:2, 1:2), 1, paste0, collapse="")
[1] "11" "21" "12" "22"
I'm not sure if I understood the goals but here's that method applied to 1:10 with 2 and then 3 instances of the vector.
> str(apply( expand.grid( 1:10, 1:10), 1, paste0, collapse="") )
chr [1:100] "11" "21" "31" "41" "51" "61" "71" "81" "91" "101" "12" "22" "32" "42" "52" "62" "72" "82" "92" ...
> str(apply( expand.grid( 1:10, 1:10, 1:10), 1, paste0, collapse="") )
chr [1:1000] "111" "211" "311" "411" "511" "611" "711" "811" "911" "1011" "121" "221" "321" "421" "521" "621" ...
x <- 1:3
Use Reduce to iteratively apply the same operation
op <- function (a, b) {
paste(rep(a, times = length(b)), rep(b, each = length(a)), sep = "-")
}
Reduce(op, rep(list(x), length(x)))
# [1] "1-1-1" "2-1-1" "3-1-1" "1-2-1" "2-2-1" "3-2-1" "1-3-1" "2-3-1" "3-3-1"
#[10] "1-1-2" "2-1-2" "3-1-2" "1-2-2" "2-2-2" "3-2-2" "1-3-2" "2-3-2" "3-3-2"
#[19] "1-1-3" "2-1-3" "3-1-3" "1-2-3" "2-2-3" "3-2-3" "1-3-3" "2-3-3" "3-3-3"
Generate permutations
library(RcppAlgos)
permuteGeneral(x, length(x), repetition = TRUE,
FUN = function (x) paste0(x, collapse = "-"))

Perform a for() loop to replace those elements of vec_tor that are greater than "5" with the number "5"

Perform a for() loop to replace those elements of vector that are greater than "5" with the number "5".
#can use the functions for() and seq_along()
It doesn't need a loop though
vec_tor[vec_tor > 5] <- 5
Or in replace
replace(vec_tor, vec_tor > 5, 5)
Or with pmin
pmin(vec_tor, 5)
In a for loop, it would be
for(i in seq_along(vec_tor)) {
if(vec_tor[i] > 5) {
vec_tor[i] <- 5
}
}

Combine a list of similar length vectors with NAs to one vector

This is likely a duplicate, yet I appear to be incapable of finding a similar question atm. I have a list of (very long) vectors that are similar in length. Each vector element contains a character. Sometimes multiple vectors contain characters at the same position (sequential numbering from the beginning). Sometimes none contain a character (i.e. all contain NA). There are maybe 10 of these vectors and each has a length of millions of elements. I need to find a quick and memory-efficient way of combining the vectors to a single vector, preferably without using any dependencies (i.e. no data.table or dplyr). The example is simple and short to understand the concept.
I have:
x <- list(A = c(rep("A", 5), rep(NA, 5)), B = c(rep(NA, 4), rep("B", 5), NA))
I need to combine them to:
c(rep("A", 4), "conflict", rep("B", 4), "none")
# "A" "A" "A" "A" "conflict" "B" "B" "B" "B" "none"
Thank you for help. I should know how to do this but somehow it escapes me atm. I do have an apply solution that goes in row by row but that is inefficient. Need to vectorize the solution.
apply(do.call(cbind, x), 1, function(k) {
if(sum(is.na(k)) == length(k)) {
"none"
} else if (sum(!is.na(k)) == 1) {
k[!is.na(k)]
} else {
"conflict"
}
})
This solution uses a vectorized function f and Reduce to apply it to the list. But it assumes that all vectors have the same length. And Reduce is not known for its speed-wise performance.
f <- function(x, y){
na.x <- is.na(x) | x == "none"
na.y <- is.na(y) | y == "none"
x[na.x & na.y] <- "none"
x[!na.x & !na.y & x != y] <- "conflict"
x[!na.x & na.y] <- x[!na.x & na.y]
x[na.x & !na.y] <- y[na.x & !na.y]
x
}
Reduce(f, x)
# [1] "A" "A" "A" "A" "conflict" "B"
# [7] "B" "B" "B" "none"
Reduce(f, list(A=NA, B = NA, C = 'A'))
#[1] "A"
Here's a vectorised version of your code :
dat <- do.call(cbind, x)
#Logical matrix
mat <- !is.na(dat)
#Number of non-NA's in each row
rs <- rowSums(mat)
#First non-NA value
val <- dat[cbind(1:nrow(dat), max.col(mat, ties.method = 'first'))]
#More than 1 non-NA value
val[rs > 1] <- 'conflict'
#Only NA value
val[rs == 0] <- 'none'
val
#[1] "A" "A" "A" "A" "Conflict" "B"
#[7] "B" "B" "B" "none"
EDIT - Updated to include suggestion from #Henrik to avoid nested ifelse which should make the solution faster.
Another one
x <- list(A = c(rep("A", 5), rep(NA, 5)), B = c(rep(NA, 4), rep("B", 5), NA))
y <- apply(do.call('rbind', x), 2, function(x) toString(na.omit(x)))
y[!nzchar(y)] <- 'none'
replace(y, grepl(',', y), 'conflict')
# [1] "A" "A" "A" "A" "conflict" "B" "B" "B" "B" "none"

Extract the order list from an ordered correlation matrix: R

I created a correlation matrix and visualize it using the corrplot function with the following code
temp<-matrix(rexp(25, rate=.1), ncol=5)
tempCor<-cor(temp)
tempCor <- data.frame(tempCor)
names(tempCor) <- c(1:5)
corrplot(t(tempCor),method="pie",order="AOE")
Here is the result of the corrplot funciton
Is there any way to get the order list from this result, which is (4,5,1,3,2)?
Try this:
library(corrplot)
set.seed(1234)
temp <- matrix(rexp(25, rate=.1), ncol=5)
tempCor <- cor(temp)
tempCor <- data.frame(tempCor)
names(tempCor) <- c(1:5)
out <- corrplot(t(tempCor),method="pie",order="AOE")
dimnames(out)
Here is what you are looking for:
[[1]]
[1] "5" "1" "3" "4" "2"
[[2]]
[1] "1" "2" "3" "4" "5"

An efficient way of converting a vector of characters into integers based on frequency in R

I have a vector of characters consisting of only 'a' or 'g', I want to convert them to integers based on frequency, i.e. the more frequent one should be coded to 0, and the other to 1, for example:
set.seed(17)
x = sample(c('g', 'a'), 10, replace=T)
x
# [1] "g" "a" "g" "a" "g" "a" "g" "g" "a" "g"
x[x == names(which.max(table(x)))] = 0
x[x != 0] = 1
x
# [1] "0" "1" "0" "1" "0" "1" "0" "0" "1" "0"
This works, but I wonder if there is a more efficient way to do it.
(We don't have to consider the 50%-50% case here, because it should never happen in our study.)
Use this:
ag.encode <- function(x)
{
result <- x == "a"
if( sum(result) > length(result) %/% 2 ) 1-result else as.numeric(result)
}
If you want to keep the labels in a factor structure, use this instead:
ag.encode2factor <- function(x)
{
result <- x == "a"
if( sum(result) > length(result) %/% 2 )
{
factor(2-result, labels=c("a","g"))
}
else
{
factor(result+1, labels=c("g","a"))
}
}
You can convert your character vector to a factor one. This solution is more general in the sense you don't need to know the name of the 2 characters used to create x.
y <- as.integer(factor(x))-1
if(sum(y)>length(y)/2) y <- as.integer(!y)

Resources