Create all possible numeric combinations given n values - r

I want to create a function such that, given a vector of n numbers, create all possible combinations from them (the same expand.grid() does, really).
For example, given the vector [1, 2] I want to create a function such that it outputs all possible combinations, i.e., "11", "12", "21", "21".
This is what I've come up with:
V <- trunc(runif(3, 0, 9))
FE2 <- function(V){
C <- c()
n <- 0
for(i in 1:length(V)){
for (j in 1:length(V)){
for (k in 1:length(V)){
C <- c(paste0(V[i], V[j], V[k]))
n <- n + 1
print(c(paste0("Number of combination: ", n), paste0("Combination: ", C)))
}
}
}
}
FE2(V)
The function does work. The problem is that for each element of the original vector, I have to add another for(). That is, if I wanted to compute all possible combinations for a vector of 10 elements, say [1, 2, ..., 10] I would have to create 10 for()-loops in my function. I wonder if there's another, more efficient way to do it, as long as it does not significantly differs from my actual solution.

With expand.grid you then need to "collapse" the rows:
apply( expand.grid( 1:2, 1:2), 1, paste0, collapse="")
[1] "11" "21" "12" "22"
I'm not sure if I understood the goals but here's that method applied to 1:10 with 2 and then 3 instances of the vector.
> str(apply( expand.grid( 1:10, 1:10), 1, paste0, collapse="") )
chr [1:100] "11" "21" "31" "41" "51" "61" "71" "81" "91" "101" "12" "22" "32" "42" "52" "62" "72" "82" "92" ...
> str(apply( expand.grid( 1:10, 1:10, 1:10), 1, paste0, collapse="") )
chr [1:1000] "111" "211" "311" "411" "511" "611" "711" "811" "911" "1011" "121" "221" "321" "421" "521" "621" ...

x <- 1:3
Use Reduce to iteratively apply the same operation
op <- function (a, b) {
paste(rep(a, times = length(b)), rep(b, each = length(a)), sep = "-")
}
Reduce(op, rep(list(x), length(x)))
# [1] "1-1-1" "2-1-1" "3-1-1" "1-2-1" "2-2-1" "3-2-1" "1-3-1" "2-3-1" "3-3-1"
#[10] "1-1-2" "2-1-2" "3-1-2" "1-2-2" "2-2-2" "3-2-2" "1-3-2" "2-3-2" "3-3-2"
#[19] "1-1-3" "2-1-3" "3-1-3" "1-2-3" "2-2-3" "3-2-3" "1-3-3" "2-3-3" "3-3-3"
Generate permutations
library(RcppAlgos)
permuteGeneral(x, length(x), repetition = TRUE,
FUN = function (x) paste0(x, collapse = "-"))

Related

Return row names of dataframe meeting condition stored in list using lapply

I have a list containing data frames, each with varying observations. Here is an example of what I am working with:
set.seed(9)
df<- data.frame(x1 = round(runif(80, 0, 5)),
x2 = round(runif(80, 0, 15)),
x3 = sample(letters[1:8], 80, replace = TRUE))
my.list <- vector(mode = "list", length = 8)
my.list <- lapply(unique(df$x3), function(x) {subset(df, x3 == x)})
What I am trying to achieve is to find the row names of each data frame for which a condition is met. Using lapply() the nearest I have got to what I want to achieve is with the code below. However, this returns NA's; I am not sure why this happens.
> lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2] < 5), ]); x})[[1]]
[1] "1" "5" "11" "22" "46" "53" "61" "63" "64" "79" "80" "NA" "NA.1" "NA.2" "NA.3"
What needs to be done so that only the row names are returned?
You have to change your lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2] < 5), ]); x})
with this below to make sure only the subset list go into the row.names call
lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2][1] < 5), ]); x})
Here's a way you can do:
v = lapply(my.list, function(x) {
y <- rowSums(x[,1:2] < 5) == 2
names(y[y == T])
})
print(v)
[[1]]
[1] "1" "13" "65"
[[2]]
[1] "20" "48" "58" "63"
[[3]]
[1] "3" "43"
[[4]]
[1] "5" "12" "24" "77" "80"
[[5]]
[1] "8" "31"
[[6]]
[1] "25"
[[7]]
[1] "17" "19" "23" "49" "60" "62"
[[8]]
[1] "15" "30" "40"

creating a matrix/dataframe with two for loops in R

This is my first post on SO, so be kind!
My question is vaguely related to this one:
Double for loop in R creating a matrix
I want to create a matrix/dataframe and the approach my mind has chosen is to nest two for loops, one to create the first row and the second to repeat it for the rows I nedd.
I could successfully create the first loop, but I can't seem to iterate it for the number of rows I need.
I'm sure that there is a better way to do this, anyway, this is the for loop that gives the result I need for the first row:
x <- character(0)
for(j in 1:18){
x <- c(x, sum(it_mat[1, 2:26] == j))
}
it_mat is a matrix of 417 rows and 26 columns, where the first column is a string vector with various names and the subsequent columns are randomly generated numbers from 1 to 18.
Here's the first row:
[1,] "Charlie" "14" "3" "9" "14" "3" "9" "11" "11" "18" "17" "16" "5" "18" "6" "10" "3" "9" "9" "3" "18" "12" "8" "5" "5" "4"
I want to create a matrix/df where I count how many times, for each name, each number appearead.
The for loop I created above gives me the result I want for the first row:
x
[1] "0" "0" "4" "1" "3" "1" "0" "1" "4" "1" "2" "1" "0" "2" "0" "1" "1" "3"
I really can't iterate it for the subsequent rows with another for loop, there must be something very mundane that I do wrong.
This is my best attempt:
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
I went on it all afternoon and now I give up and reach out to you, before you give me the correct way to do it, please explain what I'm doing wrong in the nested for loops try, I might learn something.
I hope I explained myself, sorry if I've been too verbose.
Thanks for your time.
#RuiBarradesh has pin-pointed the actual problem in OP last attempt. There is another way to fix the OP code using rbind.
# Do not create rows at this place. Let the rows be added with rbind
tr_mat <- matrix(nrow = 0, ncol = 18) #(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
tr_mat # This will display correct result too
Another way, using base R. Note that *apply functions are loops in disguise.
tr_mat2 <- sapply(1:18, function(j) sapply(1:147, function(i) sum(it_mat[i, 2:26] == j)))
Note that this code will produce a matrix of numbers, while your tr_mat is of mode character:
all.equal(tr_mat, tr_mat2)
#[1] "Modes: character, numeric"
DATA.
This is the dataset generation code that I have used to test the code above.
set.seed(7966) # make the results reproducible
it_mat <- t(replicate(147, c(sample(letters, 1), sample(18, 25, TRUE))))
EDIT.
Following the suggestion in the comments by MKR, here is the OP's code corrected with my modification in the comment to his (the OP's) post.
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat[i, ] <- x
}
This is the code that I have used to produce the matrix tr_mat refered to in the all.equal test above.
Do you realy need 2 loops? Here is a solution without any loop using data.table and combination of melt/dcast functions:
library(data.table)
# dataset ----------------------------
seed(2018)
it_mat<-data.frame(c1=c('Charlie','John','Peter'))
for(i in 2:26){
it_mat[,paste0('c',i)]<-sample(1:18,3)
}
# calculation ----------------------------
it_mat<-data.table(it_mat)
it_mat<-melt(it_mat,id.vars='c1')
it_mat[,.N,by=.(c1,value)][order(c1,value)]
dcast(it_mat[,.N,by=.(c1,value)][order(c1,value)],c1~value)

R - return to numbers from cut

I have a table with the cuts in intervals like:
bin targets casos prop phyp logit
(-2,-1] 193 6144 0.0314 0 -3.4286244
(-1,3] 128 431 0.2970 1 -0.8617025
(3,11] 137 245 0.5592 1 0.2378497
I want to get the original cuts. I tried with:
a<-strsplit(as.character(pl$table[,'bin']), ' ')
And then I tried to split each row with:
lapply(a, function(x) strsplit(x, ",")[1] )
But I don't get the expected result, which is:
(-1,3,11)
Is there a better way to achieve this? What else do I need to do to get to the result?
Thanks.
If your data is consistently in this format, you could use gsub().
df <- data.frame(bin = c('(-2,-1]','(1,3]','(3,11]'),
targets = c(193, 128, 137),
casos = c(6144, 431, 245),
prop = c(0.0314, 0.297, 0.5592),
phyp = c(0,1,1),
logit = c(-3.4286244,-0.8617025, 0.2378497), stringsAsFactors = F)
a <- strsplit(df$bin, ',')
sapply(a, function(x) gsub("]", "", x))[2,]
sapply(a, function(x) gsub("\\(", "", x))[1,]
Which gives you
[1] "-1" "3" "11"
[1] "-2" "1" "3"
In your example, there are more bounds than you say you are hoping to retrieve. This will give you all bounds:
d <- read.table(text=' bin targets casos prop phyp logit
"(-2,-1]" 193 6144 0.0314 0 -3.4286244
"(1,3]" 128 431 0.2970 1 -0.8617025
"(3,11]" 137 245 0.5592 1 0.2378497', header=T)
strings <- as.character(levels(d$bin))
strings <- substr(strings, 2, nchar(strings)-1)
unique(unlist(strsplit(strings, ",")))
# [1] "-2" "-1" "1" "3" "11"
If you only wanted the upper bounds, this will work:
strings <- as.character(levels(d$bin))
strings <- sapply(strsplit(strings, ","), function(l){ l[2] })
strings <- substr(strings, 1, nchar(strings)-1)
unique(strings)
# [1] "-1" "3" "11"
Another way would be:
a<-strsplit(as.character(pl$table[,'bin']), ' ')
lapply(a, function(x) unlist(strsplit(x, ",|]"))[2])

MAPPLY across a vector in R

I've seen several examples on Google, but still don't understand quite well how it works
here's what I'm trying to do.
I have a text array
> V <- c("aa","bb","cc","dd","ee","ff")
> V
[1] "aa" "bb" "cc" "dd" "ee" "ff"
i would like as an output an array of length length(V)-2 (=4)
composed of
[1] "aabbcc" "bbccdd" "ccddee" "ddeeff"
which is a vector with the concatenations of 3 successive elements of V
i'm thinking of using something like mapply
mapply(function(x,i){paste(x[i:i+2],sep="",collapse="")},V,1:(length(V)-2))
but thats not the right syntax
thanks
Here's a solution in case your project needs many successive elements. There are other approaches, mapply is just one:
mapply(function(x,y) paste(V[x:y], collapse=""), 1:(length(V)-2), 3:length(V))
#[1] "aabbcc" "bbccdd" "ccddee" "ddeeff"
As per your comments, you can create a function and use lapply for a list:
paste2 <- function(vec, n=3) {
mapply(function(x,y) paste(vec[x:y], collapse=""), 1:(length(vec)-(n-1)), n:length(vec))
}
## single vector still works
paste2(V)
#[1] "aabbcc" "bbccdd" "ccddee" "ddeeff"
## with list
lst <- rep(list(V), 2)
lapply(lst, paste2)
#[[1]]
#[1] "aabbcc" "bbccdd" "ccddee" "ddeeff"
#
#[[2]]
#[1] "aabbcc" "bbccdd" "ccddee" "ddeeff"
You don't need any fancy mapply for this:
n = length(V)
paste0(V[1:(n - 2)], V[2:(n - 1)], V[3:n])
Here's a parametric solution, you still don't need mapply:
i = 3
apply(matrix(V, nrow = length(V) + 1, ncol = i)[1:(length(V) - i + 1), ],
MARGIN = 1, FUN = paste, collapse = "")
You could functionalize this:
f = function(V, i) {
apply(matrix(V, nrow = length(V) + 1, ncol = i)[1:(length(V) - i + 1), ],
MARGIN = 1, FUN = paste, collapse = "")
}
You could then apply it to a list of vectors like this:
lapply(list(c("a", "b", "c", "d"), letters), f, i = 3)
# [[1]]
# [1] "abc" "bcd"
#
# [[2]]
# [1] "abc" "bcd" "cde" "def" "efg" "fgh" "ghi" "hij" "ijk" "jkl" "klm" "lmn" "mno" "nop" "opq"
# [16] "pqr" "qrs" "rst" "stu" "tuv" "uvw" "vwx" "wxy" "xyz"
You would need mapply (and you could use it with the function) if you had several different vectors and for each vector you wanted a concatenation of a different number of elements.
Here's another alternative using apply() and embed()
rev(apply(embed(rev(V),3), 1, paste, collapse=""))
# [1] "aabbcc" "bbccdd" "ccddee" "ddeeff"

how to output the adjacency group from edge list?

Edge list:
el <- rbind(c(6,11), c(1,8), c(8,12), c(11,17),
c(6,7), c(7,11), c(18,19), c(6,16))
In this case, 6,11,17,7,16 connect, 1,8,2 connect, and 18,19 connect with each other. So there are three groups. How to get these groups from this edge list using R?
You are looking for connected components: http://en.wikipedia.org/wiki/Connected_component_(graph_theory) This is (somewhat confusingly) called clusters in igraph. To get the list of communities is relatively complicated:
library(igraph)
g <- graph.data.frame(data.frame(el))
clu <- clusters(g)
comp <- tapply(seq_along(clu$membership), clu$membership, simplify = FALSE, function(x) x)
lapply(comp, function(x) V(g)$name[x])
# $`1`
# [1] "6" "11" "7" "17" "16"
#
# $`2`
# [1] "1" "8" "12"
#
# $`3`
# [1] "18" "19"

Resources