how to output the adjacency group from edge list? - r

Edge list:
el <- rbind(c(6,11), c(1,8), c(8,12), c(11,17),
c(6,7), c(7,11), c(18,19), c(6,16))
In this case, 6,11,17,7,16 connect, 1,8,2 connect, and 18,19 connect with each other. So there are three groups. How to get these groups from this edge list using R?

You are looking for connected components: http://en.wikipedia.org/wiki/Connected_component_(graph_theory) This is (somewhat confusingly) called clusters in igraph. To get the list of communities is relatively complicated:
library(igraph)
g <- graph.data.frame(data.frame(el))
clu <- clusters(g)
comp <- tapply(seq_along(clu$membership), clu$membership, simplify = FALSE, function(x) x)
lapply(comp, function(x) V(g)$name[x])
# $`1`
# [1] "6" "11" "7" "17" "16"
#
# $`2`
# [1] "1" "8" "12"
#
# $`3`
# [1] "18" "19"

Related

Create all possible numeric combinations given n values

I want to create a function such that, given a vector of n numbers, create all possible combinations from them (the same expand.grid() does, really).
For example, given the vector [1, 2] I want to create a function such that it outputs all possible combinations, i.e., "11", "12", "21", "21".
This is what I've come up with:
V <- trunc(runif(3, 0, 9))
FE2 <- function(V){
C <- c()
n <- 0
for(i in 1:length(V)){
for (j in 1:length(V)){
for (k in 1:length(V)){
C <- c(paste0(V[i], V[j], V[k]))
n <- n + 1
print(c(paste0("Number of combination: ", n), paste0("Combination: ", C)))
}
}
}
}
FE2(V)
The function does work. The problem is that for each element of the original vector, I have to add another for(). That is, if I wanted to compute all possible combinations for a vector of 10 elements, say [1, 2, ..., 10] I would have to create 10 for()-loops in my function. I wonder if there's another, more efficient way to do it, as long as it does not significantly differs from my actual solution.
With expand.grid you then need to "collapse" the rows:
apply( expand.grid( 1:2, 1:2), 1, paste0, collapse="")
[1] "11" "21" "12" "22"
I'm not sure if I understood the goals but here's that method applied to 1:10 with 2 and then 3 instances of the vector.
> str(apply( expand.grid( 1:10, 1:10), 1, paste0, collapse="") )
chr [1:100] "11" "21" "31" "41" "51" "61" "71" "81" "91" "101" "12" "22" "32" "42" "52" "62" "72" "82" "92" ...
> str(apply( expand.grid( 1:10, 1:10, 1:10), 1, paste0, collapse="") )
chr [1:1000] "111" "211" "311" "411" "511" "611" "711" "811" "911" "1011" "121" "221" "321" "421" "521" "621" ...
x <- 1:3
Use Reduce to iteratively apply the same operation
op <- function (a, b) {
paste(rep(a, times = length(b)), rep(b, each = length(a)), sep = "-")
}
Reduce(op, rep(list(x), length(x)))
# [1] "1-1-1" "2-1-1" "3-1-1" "1-2-1" "2-2-1" "3-2-1" "1-3-1" "2-3-1" "3-3-1"
#[10] "1-1-2" "2-1-2" "3-1-2" "1-2-2" "2-2-2" "3-2-2" "1-3-2" "2-3-2" "3-3-2"
#[19] "1-1-3" "2-1-3" "3-1-3" "1-2-3" "2-2-3" "3-2-3" "1-3-3" "2-3-3" "3-3-3"
Generate permutations
library(RcppAlgos)
permuteGeneral(x, length(x), repetition = TRUE,
FUN = function (x) paste0(x, collapse = "-"))

Return row names of dataframe meeting condition stored in list using lapply

I have a list containing data frames, each with varying observations. Here is an example of what I am working with:
set.seed(9)
df<- data.frame(x1 = round(runif(80, 0, 5)),
x2 = round(runif(80, 0, 15)),
x3 = sample(letters[1:8], 80, replace = TRUE))
my.list <- vector(mode = "list", length = 8)
my.list <- lapply(unique(df$x3), function(x) {subset(df, x3 == x)})
What I am trying to achieve is to find the row names of each data frame for which a condition is met. Using lapply() the nearest I have got to what I want to achieve is with the code below. However, this returns NA's; I am not sure why this happens.
> lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2] < 5), ]); x})[[1]]
[1] "1" "5" "11" "22" "46" "53" "61" "63" "64" "79" "80" "NA" "NA.1" "NA.2" "NA.3"
What needs to be done so that only the row names are returned?
You have to change your lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2] < 5), ]); x})
with this below to make sure only the subset list go into the row.names call
lapply(my.list, function(x) {x <- row.names(x[which(x[,1:2][1] < 5), ]); x})
Here's a way you can do:
v = lapply(my.list, function(x) {
y <- rowSums(x[,1:2] < 5) == 2
names(y[y == T])
})
print(v)
[[1]]
[1] "1" "13" "65"
[[2]]
[1] "20" "48" "58" "63"
[[3]]
[1] "3" "43"
[[4]]
[1] "5" "12" "24" "77" "80"
[[5]]
[1] "8" "31"
[[6]]
[1] "25"
[[7]]
[1] "17" "19" "23" "49" "60" "62"
[[8]]
[1] "15" "30" "40"

creating a matrix/dataframe with two for loops in R

This is my first post on SO, so be kind!
My question is vaguely related to this one:
Double for loop in R creating a matrix
I want to create a matrix/dataframe and the approach my mind has chosen is to nest two for loops, one to create the first row and the second to repeat it for the rows I nedd.
I could successfully create the first loop, but I can't seem to iterate it for the number of rows I need.
I'm sure that there is a better way to do this, anyway, this is the for loop that gives the result I need for the first row:
x <- character(0)
for(j in 1:18){
x <- c(x, sum(it_mat[1, 2:26] == j))
}
it_mat is a matrix of 417 rows and 26 columns, where the first column is a string vector with various names and the subsequent columns are randomly generated numbers from 1 to 18.
Here's the first row:
[1,] "Charlie" "14" "3" "9" "14" "3" "9" "11" "11" "18" "17" "16" "5" "18" "6" "10" "3" "9" "9" "3" "18" "12" "8" "5" "5" "4"
I want to create a matrix/df where I count how many times, for each name, each number appearead.
The for loop I created above gives me the result I want for the first row:
x
[1] "0" "0" "4" "1" "3" "1" "0" "1" "4" "1" "2" "1" "0" "2" "0" "1" "1" "3"
I really can't iterate it for the subsequent rows with another for loop, there must be something very mundane that I do wrong.
This is my best attempt:
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
I went on it all afternoon and now I give up and reach out to you, before you give me the correct way to do it, please explain what I'm doing wrong in the nested for loops try, I might learn something.
I hope I explained myself, sorry if I've been too verbose.
Thanks for your time.
#RuiBarradesh has pin-pointed the actual problem in OP last attempt. There is another way to fix the OP code using rbind.
# Do not create rows at this place. Let the rows be added with rbind
tr_mat <- matrix(nrow = 0, ncol = 18) #(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat <- rbind(tr_mat, x)
}
tr_mat # This will display correct result too
Another way, using base R. Note that *apply functions are loops in disguise.
tr_mat2 <- sapply(1:18, function(j) sapply(1:147, function(i) sum(it_mat[i, 2:26] == j)))
Note that this code will produce a matrix of numbers, while your tr_mat is of mode character:
all.equal(tr_mat, tr_mat2)
#[1] "Modes: character, numeric"
DATA.
This is the dataset generation code that I have used to test the code above.
set.seed(7966) # make the results reproducible
it_mat <- t(replicate(147, c(sample(letters, 1), sample(18, 25, TRUE))))
EDIT.
Following the suggestion in the comments by MKR, here is the OP's code corrected with my modification in the comment to his (the OP's) post.
tr_mat <- matrix(, nrow = 147, ncol = 18)
for(i in 1:147){
x <- character()
for(j in 1:18){
x <- c(x, sum(it_mat[i, 2:26] == j))
}
tr_mat[i, ] <- x
}
This is the code that I have used to produce the matrix tr_mat refered to in the all.equal test above.
Do you realy need 2 loops? Here is a solution without any loop using data.table and combination of melt/dcast functions:
library(data.table)
# dataset ----------------------------
seed(2018)
it_mat<-data.frame(c1=c('Charlie','John','Peter'))
for(i in 2:26){
it_mat[,paste0('c',i)]<-sample(1:18,3)
}
# calculation ----------------------------
it_mat<-data.table(it_mat)
it_mat<-melt(it_mat,id.vars='c1')
it_mat[,.N,by=.(c1,value)][order(c1,value)]
dcast(it_mat[,.N,by=.(c1,value)][order(c1,value)],c1~value)

Extract the order list from an ordered correlation matrix: R

I created a correlation matrix and visualize it using the corrplot function with the following code
temp<-matrix(rexp(25, rate=.1), ncol=5)
tempCor<-cor(temp)
tempCor <- data.frame(tempCor)
names(tempCor) <- c(1:5)
corrplot(t(tempCor),method="pie",order="AOE")
Here is the result of the corrplot funciton
Is there any way to get the order list from this result, which is (4,5,1,3,2)?
Try this:
library(corrplot)
set.seed(1234)
temp <- matrix(rexp(25, rate=.1), ncol=5)
tempCor <- cor(temp)
tempCor <- data.frame(tempCor)
names(tempCor) <- c(1:5)
out <- corrplot(t(tempCor),method="pie",order="AOE")
dimnames(out)
Here is what you are looking for:
[[1]]
[1] "5" "1" "3" "4" "2"
[[2]]
[1] "1" "2" "3" "4" "5"

Margining two vectors on precisely defined intervals

Given are two vectors:
vec_nums <- 1:20
vec_ltrs <- letters[1:10]
I would like to write a function that would merge them some each element from the second vectors appears on the precisely defined position within the first vector. For example, running:
vec_mrg <- funMergeVectsByPlace(x = vec_num, y = vec_ltrs, position = 3)
Should return vec_mrg of the following content:
[1] "a" "b" "1" "c" "d" "2" "f" "g" "3" "i" "j" "4" "l" "m" "5" ...
Desired characteristics:
The function places element from the vector passed via the y = on the position given in the position = counting from the left hand side. So position = 3 should be understood as *every third place" accounting for 3, 6, ...
The function should work on numeric string and factor vectors and return an ordered factor.
The function should work on factor, string and numeric vectors
In case of vector y being shorter than than the number of inserts in the x the function should return remaining part of x without any additions
Suggested structure
I would envisage for the function to be of this structure:
funMergeVectsByPlace <- function(x,y position = 3) {
# Convert
vec_a <- as.character(x)
vec_b <- as.character(y)
# Missing part
# Combine two vectors
# Create ordered factor
vec_fac <- factor(vec_mrg,
# levels =
# I want the levels to reflect the order of elements in the vec_merg
)
# Return
return(vec_fac)
}
Samples
Simplest
Concerning attempts, simplest approach:
vec_mrg <- c(vec_nums, vec_ltrs)
vec_mrg <- order(vec_mrg)
But this would not create the order
Loop
for (i in 1:length(vec_nums)) {
pos <- position
vec_nums[pos] <- vec_ltrs[i]
pos <- pos + pos
# i will be out of bounds and the way to move the other vector is missing
}
vec_mrg <- function(x,y,pos) {
res <- y
counter <- seq(floor(length(y)/(pos-1)))
for(i in counter) {
res <- append(res, x[i], seq(pos-1,by=pos, length.out=length(counter))[i])
}
res
}
vec_mrg(vec_nums, vec_ltrs, 3)
#[1] "a" "b" "1" "c" "d" "2" "e" "f" "3" "g" "h" "4" "i" "j"
#[15] "5"
A loop-free solution:
funMergeVectsByPlace <- function( x, y, position )
{
n <- min( length(y)%/%(position-1), length(x) )
A <- rbind( matrix(head(y,n*(position-1)),position-1), head(x,n) )
rest <- c( x[-(1:n)], y[-(1:(n*(position-1)))] )
c(c(A),rest)
}
Speed comparison with Lafortunes solution:
> library(microbenchmark)
> vec_nums <- 1:20
> vec_ltrs <- letters[1:10]
> microbenchmark(Lafortune = vec_mrg(vec_nums,vec_ltrs,3),
+ mra68 = funMergeVectsByPlace(vec_nums,vec_ltrs,3),
+ times .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval
Lafortune 137.677 143.112 161.12006 146.734 153.980 2931.512 10000
mra68 77.443 81.067 92.13208 83.331 86.954 2718.204 10000
Larger vectors:
> vec_nums <- 1:2000
> vec_ltrs <- letters[rep(1:10,100)]
> microbenchmark(Lafortune = vec_mrg(vec_nums,vec_ltrs,3),
+ mra68 = funMergeVectsByPlace(vec_nums,vec_ltrs,3),
+ times .... [TRUNCATED]
Unit: milliseconds
expr min lq mean median uq max neval
Lafortune 32.993883 40.991796 63.758011 51.171020 90.122351 456.9748 1000
mra68 1.101865 1.489533 2.468496 1.751299 3.338881 230.0460 1000
> v1 <- vec_mrg(vec_nums,vec_ltrs,3)
> v2 <- funMergeVectsByPlace(vec_nums,vec_ltrs,3)
>
Notice that the vec_mrg function does not append the rest of the x vector to the result, but funMergeVectsByPlace does. Otherwise the results are the same:
> v1 <- vec_mrg(1:20,letters[1:10],3)
> v2 <- funMergeVectsByPlace(1:20,letters[1:10],3)
> v1
[1] "a" "b" "1" "c" "d" "2" "e" "f" "3" "g" "h" "4" "i" "j" "5"
> v2
[1] "a" "b" "1" "c" "d" "2" "e" "f" "3" "g" "h" "4" "i" "j" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
> identical(v1,v2[1:length(v1)])
[1] TRUE
>
Neither vec_mrg nor funMergeVectsByPlace return factors. If one includes factor(...), both functions are getting slower, but funMergeVectsByPlace is still faster than vec_mrg.

Resources