I want to calculate combinations in R.
I want to calculate and get results as in the below code, but in my code, the number in the for loop depends on the number of variables (e.g., length(ncomb)).
How do I set the number in a for loop?
Or is there a better way to calculate the combinations that I want?
#Block
nblock = c(1,2,3)
num_nblock = length(nblock)
#Position
tol = c(1:6)
total = length(tol)
#Calculate number of Combination
#6C1*5C2*3C3
t1 = total
ncomb=c()
for (i in 1:num_nblock) {
ncomb[i] = choose(t1,nblock[i])
t1 = t1-nblock[i]
}
#Calculate Combination
Clist = data.frame()
for (i in 1:ncomb[1]) {
comb1 = combn(total,nblock[1])
remain = setdiff(tol,comb1[,i])
for (j in 1:ncomb[2]) {
comb2 = combn(remain,nblock[2])
remain2 = setdiff(remain,comb2[,j])
for (k in 1:ncomb[3]) {
comb3 = combn(remain2,nblock[3])
ans = c(comb1[,i],comb2[,j],comb3[,k])
Clist =rbind(Clist,ans)
}
}
}
#Result :Clist
X1L X2L X3L X4L X5L X6L
1 1 2 3 4 5 6
2 1 2 4 3 5 6
3 1 2 5 3 4 6
4 1 2 6 3 4 5
5 1 3 4 2 5 6
6 1 3 5 2 4 6
7 1 3 6 2 4 5
8 1 4 5 2 3 6
9 1 4 6 2 3 5
10 1 5 6 2 3 4
.....
50 5 4 6 1 2 3
51 6 1 2 3 4 5
52 6 1 3 2 4 5
53 6 1 4 2 3 5
54 6 1 5 2 3 4
55 6 2 3 1 4 5
56 6 2 4 1 3 5
57 6 2 5 1 3 4
58 6 3 4 1 2 5
59 6 3 5 1 2 4
60 6 4 5 1 2 3
So here is an idea I have which may be harder to understand but solves your problem of having a variable number of for loops.
Before I show my code, let me explain the idea using your example of dividing 1 through 6 into blocks of 1, 2, and 3. As you said we can calculate the total number of combinations as 6C1*5C2*3C3=60. Now the question is how to fill up the 60 entries.
So if you think about a tree from Block 1 to 3, each branch of Block 1 correspond to 5C2 number of branches of Block2, and each branch of Block 2 correspond to 3C3 branch of Block 3. In this way, the total number of branches will be 6C1*5C2*3C3=60. Essentially how you wanna fill up the output matrix is to repeat each branch in Block 1 5C2*3C3 times, each branch in Block 2 3C3 time, and each branch in Block 3 should appear uniquely. To summarize you want to repeat each branch the number of times to the "cardinality" of Blocks to the right hand side.
This is what the following code is doing.
# ++++ Using your example and initialization ++++
# Block
nblock = c(1,2,3)
num_nblock = length(nblock)
# Position
tol = c(1:6)
total = length(tol)
t1 = total
ncomb=c()
for (i in 1:num_nblock) {
ncomb[i] = choose(t1,nblock[i])
t1 = t1-nblock[i]
}
# ++++++++
# Initialize result matrix
Clist = matrix(nrow = prod(ncomb), ncol = total)
# Block col ID: produce list of (1),(2,3),(4,5,6) as col ID of output matrix
block_cols = list()
start = 1
for (i in 1:num_nblock) {
block_cols[[i]] = start:(start+nblock[i]-1)
start = start + nblock[i]
}
# Fill the output matrix: iterate each (row,block) of matrix
for (i in 1:prod(ncomb)) {
for (j in 1:num_nblock) {
# First col ID of each block. In this example, always 1, 2, 4
block_first_col_id = block_cols[[j]][1]
# Fill the pos when its still NA
if (is.na(Clist[i, block_first_col_id])) {
# Filler is all combination having removed numbers appeared in left blocks
remain = setdiff(tol, Clist[i, 0:(block_first_col_id-1)])
com = combn(remain, nblock[j])
# Key step: replicate to fill remaining cardinality
filler = apply(com, 1, function(x) rep(x, each = prod(ncomb[(j+1):length(ncomb)])))
# Store filler to output.
# Filler may be a vector, in which case dim will return NULL
filler_nrow = ifelse(is.null(dim(filler)[1]), 1, dim(filler)[1])
Clist[i:(i + filler_nrow - 1), block_cols[[j]]] = filler
}
}
}
Related
I want to obtain a list of mutually connected nodes between every pair of nodes in my graph:
library(igraph)
G <- graph(c(1,2,1,3,1,4,2,4, 2,3,2,5,3,5,4,5,5,6,5,7,7,8,7,9), directed=F)
plot(G)
the edge is undirected.
In this graph, for instance, node 1 and 2 share common nodes 3 and 4. And node 1 and 3 share common node 2. I would like to get a list of this or as a format of a data frame..
Is there a command for getting something like either one of these:
(1)
node1 node2 mutual
1 2 3, 4
1 3 2
1 4 2
2 3 1, 5
or (2)
node1 node2 mutual
1 2 3
1 2 4
1 3 2
1 4 2
2 3 1
2 3 5
I was able to get the number of mutual nodes between two nodes using this code:
# function to count the number of mutual friends between every pair of nodes
mutual_friends <- function(G) {
# initialize an emptry matrix to store number of mutual friends between pairs of nodes
num_nodes <- vcount(G)
mutual_friends <- matrix(0, nrow=num_nodes, ncol=num_nodes)
# loop over each node
for (node in 1:num_nodes) {
# get this node's list of friends
friends <- neighbors(G, node)
# add a count of 1 between all pairs of the node's friends
for (i in friends)
for (j in friends)
mutual_friends[i, j] = mutual_friends[i, j] + 1
}
# make the output readable with column names
dimnames(mutual_friends) <- list(row=V(G)$name, col=V(G)$name)
diag(mutual_friends) <- NA
mutual_friends
}
(coding credit to: https://rstudio-pubs-static.s3.amazonaws.com/72599_65ecae185590432cb2373df4825d2ef9.html#connected-components
But I'm struggling with getting a list of the mutual nodes between every pair of nodes.
I appreciate any kind of advice and help. Thanks!
This is not exactly efficient, it's a brute force double loop, but you can do
get_mutuals <- function(g) {
do.call("rbind", lapply(seq.int(1, vcount(g)-1), function(i) {
do.call("rbind", lapply(seq.int(i+1, vcount(g)), function(j) {
ni <- neighbors(g, i)
nj <- neighbors(g, j)
overlap <- intersect(ni, nj)
if (length(overlap) & i %in% nj) {
data.frame(i=i, j=j, m=overlap)
} else {
NULL
}
}))
}))
}
get_mutuals(G)
Which will give you output that looks like your version 2.
i j m
1 1 2 3
2 1 2 4
3 1 3 2
4 1 4 2
5 2 3 1
...
If you wanted something more like one you could swap to data.frame(i=i, j=j, m=toString(overlap)) to paste all the values together in the column.
Another possibility is to iterate the edges like this
get_mutuals <- function(g) {
do.call("rbind", lapply(seq.int(1, gsize(g)), function(i) {
edge <- ends(g, i)
i <- edge[1, 1]
j <- edge[1, 2]
ni <- neighbors(g, i)
nj <- neighbors(g, j)
overlap <- intersect(ni, nj)
if (length(overlap)) {
data.frame(i=i, j=j, m=overlap)
} else {
NULL
}
}))
}
get_mutuals(G)
Note that if two adjacent nodes share a common neighbor they form a triangle. Function igraph::triangles gives you all triangles in graph.
library(dplyr)
triangle_matrix <- matrix(igraph::triangles(G), ncol = 3, byrow = TRUE)
gtools::permutations(3, 3) %>%
apply(1, function(x) list(triangle_matrix[, x])) %>%
unlist(recursive = FALSE) %>%
Reduce(rbind, .) %>%
as.data.frame() %>%
filter(V1 < V2) %>%
arrange(V1, V2, V3)
You can get (1) by continuing the pipe with:
... %>% group_by(V1, V2) %>% summarise(mutual = list(V3))
Update
If you want to find out all directly connected nodes with a mutual node, you can try triangles in igraph like below
do.call(
rbind,
apply(
matrix(triangles(G), nrow = 3),
2,
function(v) {
u <- t(sapply(seq_along(v), function(k) t(v[-k])))
setNames(data.frame(cbind(v, rbind(u, u[, 2:1]))), c("node1", "node2", "mutual"))
}
)
)
which gives
node1 node2 mutual
1 5 2 3
2 2 5 3
3 3 5 2
4 5 3 2
5 2 3 5
6 3 2 5
7 5 2 4
8 2 5 4
9 4 5 2
10 5 4 2
11 2 4 5
12 4 2 5
13 2 1 4
14 1 2 4
15 4 2 1
16 2 4 1
17 1 4 2
18 4 1 2
19 2 1 3
20 1 2 3
21 3 2 1
22 2 3 1
23 1 3 2
24 3 1 2
Perhaps you can try ego like below
setNames(
data.frame(do.call(
rbind,
lapply(
Filter(
function(x) length(x) > 2,
ego(G)
),
function(v) {
cbind(t(combn(v[-1], 2)), v[1])
}
)
)),
c("node1", "node2", "mutual")
)
which gives
node1 node2 mutual
1 2 3 1
2 2 4 1
3 3 4 1
4 1 3 2
5 1 4 2
6 1 5 2
7 3 4 2
8 3 5 2
9 4 5 2
10 1 2 3
11 1 5 3
12 2 5 3
13 1 2 4
14 1 5 4
15 2 5 4
16 2 3 5
17 2 4 5
18 2 6 5
19 2 7 5
20 3 4 5
21 3 6 5
22 3 7 5
23 4 6 5
24 4 7 5
25 6 7 5
26 5 8 7
27 5 9 7
28 8 9 7
I have a simple dataframe with group IDs and elements of each group, like this:
x <- data.frame("ID" = c(1,1,1,2,2,2,3,3,3), "Values" = c(3,5,7,2,4,5,2,4,6))
Each ID may have a different number of elements. Now I want to find all IDs that have distinct elements with other IDs. In this example, ID1 and ID3 will be selected because they have distinct elements (3,5,7 vs 2,4,6). I also want to copy these unique IDs and their elements into a new dataframe, similar to the original.
How would I do that in R? My skills with R is quite limited.
Thank you very much!
Bests,
Seems like a good question for igraph cliques with one edge to another clique but I cant seem to wrap my head on how to use it.
Anyway, here is an option applying join to identify IDs with same Values and then anti-join to remove those IDs using data.table:
library(data.table)
DT <- as.data.table(x)
for (i in DT[, unique(ID)]) {
dupeID <- DT[DT[ID==i], on=.(Values), .(ID=unique(x.ID[x.ID!=i.ID]))]
DT <- DT[!dupeID , on=.(ID)]
}
output:
ID Values
1: 1 3
2: 1 5
3: 1 7
4: 3 2
5: 3 4
6: 3 6
x <- data.frame("ID" = c(1,1,1,2,2,2,3,3,3), "Values" = c(3,5,7,2,4,5,2,4,6))
gps = split(x, x$ID)
nGroups = length(gps)
k = 1
results = data.frame(ID = NULL, Values = NULL)
for(i in 1:(nGroups - 1)){
j = i + 1
while(j <= nGroups){
if(length(intersect(gps[[i]]$Values, gps[[j]]$Values)) == 0){
print(c(i,j))
results = rbind(results, gps[[i]], gps[[j]])
}
j = j + 1
}
}
results
> results
ID Values
1 1 3
2 1 5
3 1 7
7 3 2
8 3 4
9 3 6
You can try the following code, where the y is the list of data frames (including all data frames that have exclusive Value)
xs <- split(x,x$ID)
id <- names(xs)
y <- list()
ids <- seq_along(xs)
repeat {
if (length(ids)==0) break;
y[[length(y)+1]] <- xs[[ids[1]]]
p <- ids[[1]]
qs <- p
for (q in ids[-1]) {
if (length(intersect(xs[[p]]$Value,xs[[q]]$Value))==0) {
y[[length(y)]] <- rbind(y[[length(y)]],xs[[q]])
qs <- c(qs,q)
}
}
ids <- setdiff(ids,qs)
}
Example
x <- data.frame("ID" = c(1,1,1,2,2,2,3,3,3,4,4),
"Values" = c(3,5,7,2,4,5,2,4,6,1,3))
> x
ID Values
1 1 3
2 1 5
3 1 7
4 2 2
5 2 4
6 2 5
7 3 2
8 3 4
9 3 6
10 4 1
11 4 3
then you will get
> y
[[1]]
ID Values
1 1 3
2 1 5
3 1 7
7 3 2
8 3 4
9 3 6
[[2]]
ID Values
4 2 2
5 2 4
6 2 5
10 4 1
11 4 3
I have a data frame of n = 20 variables (number of columns) spread over b = 5 blocks (4 variables per block).
I would like to create p = 4 random and equal sized blocks of variables from the 5 blocks of variables.
I tried :
sample (x = 1: p, size = n, replace = TRUE)
[1] 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 4 4 4 4 4
Example of expected result (5 variables per block):
[1] 4 1 2 1 4 2 3 1 2 3 2 1 4 3 1 2 3 3 4 4
Thanks for your help !
You can try:
sample(x = rep(1:p,n/p), size = n, replace = FALSE)
Having discussed this in comments below, here is a solution:
Create a vector that looks like what you want, and then use sample to randomly sort it by sampling the whole vector without replacement:
p <- 4
b <- 5
sample(rep(1:p, b), size = p * b)
[1] 3 1 4 3 3 4 1 1 4 2 2 4 3 2 1 2 2 4 3 1
This question already has an answer here:
Insert missing time rows into a dataframe
(1 answer)
Closed 5 years ago.
I have a dataset that look like the following
id = c(1,1,1,2,2,2,3,3,4)
cycle = c(1,2,3,1,2,3,1,3,2)
value = 1:9
data.frame(id,cycle,value)
> data.frame(id,cycle,value)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 3 8
9 4 2 9
so basically there is a variable called id that identifies the sample, a variable called cycle which identifies the timepoint, and a variable called value that identifies the value at that timepoint.
As you see, sample 3 does not have cycle 2 data and sample 4 is missing cycle 1 and 3 data. What I want to know is there a way to run a command outside of a loop to get the data to place NA's where there is no data. So I would like for my dataset to look like the following:
> data.frame(id,cycle,value)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 2 NA
9 3 3 8
10 4 1 NA
11 4 2 9
12 4 3 NA
I am able to solve this problem with a lot of loops and if statements but the code is extremely long and cumbersome (I have many more columns in my real dataset).
Also, the number of samples I have is very large so I need something that is generalizable.
Using merge and expand.grid, we can come up with a solution. expand.grid creates a data.frame with all combinations of the supplied vectors (so you'd supply it with the id and cycle variables). By merging to your original data (and using all.x = T, which is like a left join in SQL), we can fill in those rows with missing data in dat with NA.
id = c(1,1,1,2,2,2,3,3,4)
cycle = c(1,2,3,1,2,3,1,3,2)
value = 1:9
dat <- data.frame(id,cycle,value)
grid_dat <- expand.grid(id = 1:4,
cycle = 1:3)
# or you could do (HT #jogo):
# grid_dat <- expand.grid(id = unique(dat$id),
# cycle = unique(dat$cycle))
merge(x = grid_dat, y = dat, by = c('id','cycle'), all.x = T)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 2 NA
9 3 3 8
10 4 1 NA
11 4 2 9
12 4 3 NA
A solution based on the package tidyverse.
library(tidyverse)
# Create example data frame
id <- c(1, 1, 1, 2, 2, 2, 3, 3, 4)
cycle <- c(1, 2, 3, 1, 2, 3, 1, 3, 2)
value <- 1:9
dt <- data.frame(id, cycle, value)
# Complete the combination between id and cycle
dt2 <- dt %>% complete(id, cycle)
Here is a solution with data.table doing a cross join:
library("data.table")
d <- data.table(id = c(1,1,1,2,2,2,3,3,4), cycle = c(1,2,3,1,2,3,1,3,2), value = 1:9)
d[CJ(id=id, cycle=cycle, unique=TRUE), on=.(id,cycle)]
I have repeated-measures data.
I need to create a loop that will incrementally count each observation, within a participant, and label it.
I am new to writing loops. My logic was to say, for each item in the list of unique ids, count each row in that, and apply some function to that row.
Could someone point our what I am doing wrong?
data$Ob <- 0
for (i in unique(data$id)) {
count <- 1
for (u in data[data$id == i,]) {
data[data$id ==u,]$Ob <- count
count <- count + 1
print(count)
}
}
Thanks!
Justin
You can also use ave:
set.seed(1)
data <- data.frame(id = sample(4, 10, TRUE))
data$Ob = ave(data$id, data$id, FUN=seq_along)
data
id Ob
1 2 1
2 2 2
3 3 1
4 4 1
5 1 1
6 4 2
7 4 3
8 3 2
9 3 3
10 1 2
# Generate some dummy data
data <- data.frame(Ob=0, id=sample(4,20,TRUE))
# Go through every id value
for(i in unique(data$id)){
# Label observations
data$Ob[data$id == i] = 1:sum(data$id == i)
}
Be aware though that for loops are notoriously slow in R. In this simple case they work fine, but should you have millions and millions of rows in your data frame you'd better do something purely vectorized.
But you don't need a loop...
data <- data.frame (id = sample (4, 10, TRUE))
## id
## 1 3
## 2 4
## 3 1
## 4 3
## 5 3
## 6 4
## 7 2
## 8 1
## 9 1
## 10 4
data$Ob [order (data$id)] <- sequence (table (data$id))
## id Ob
## 1 3 1
## 2 4 1
## 3 1 1
## 4 3 2
## 5 3 3
## 6 4 2
## 7 2 1
## 8 1 2
## 9 1 3
## 10 4 3
(works also with character or factor IDs)
(isn't R just cool!?)