I came across a problem.
I have the scenery following:
# Create a graph
g1 <- graph.full(5)
V(g1)$name <- letters[1:vcount(g1)]
V(g)
Vertex sequence:
[1] "a" "b" "c" "d" "e"
# Contract vertex "a" and "b"
vec = c(1,1,2,3,4)
contract_1 <- contract.vertices(g1, vec, vertex.attr.comb=toString)
V(contract_1)
Vertex sequence:
[1] "a, b" "c" "d" "e"
# Contract vertex "a, b" and "c"
vec = c(1,1,2,3)
contract_2 <- contract.vertices(contract_1, vec, vertex.attr.comb=toString)
V(contract_2)
Vertex sequence:
[1] "a, b, c" "d" "e"
And so on ... (contract "a, b, c" and "d", creating the vertex "a, b, c, d")
I need to differentiate the vertexs of the previous level.
Eg.:
By contracting the vertices "a, b​​" and "c", I need using a extra markup as "|" or ";". In this case, the result would be "a, b | c" or "a, b- c" or "a, b; c".
By contracting the vertices "a, b, c" and "d" the result would be "a, b, c | d" or "a, b, c; d"
I tried a few things ...
Eg.:
g <- contract.vertices(g, matching,
vertex.attr.comb=list(name=function(x) paste(toString(x,"",sep=";"))))
However, not work
paste also has a collapse argument:
contract.vertices(
contract_1,
c(1,1,2,3),
vertex.attr.comb = list( name = function(x) paste(x, collapse=";") )
)
You could also use nested parentheses:
library(igraph)
g <- list()
k <- 5
g[[1]] <- graph.full(k)
V(g[[1]])$name <- letters[1:vcount(g1)]
for(i in 2:k) {
g[[i]] <- contract.vertices(
g[[i-1]],
c(1,1,2:k)[1:(k-i+2)],
vertex.attr.comb = list( name = function(x)
if( length(x) > 1 ) paste0( "(", paste0(x,collapse=","), ")" ) else x
)
)
}
lapply(g, V)
# [[1]]
# Vertex sequence:
# [1] "a" "b" "c" "d" "e"
# [[2]]
# Vertex sequence:
# [1] "(a,b)" "c" "d" "e"
# [[3]]
# Vertex sequence:
# [1] "((a,b),c)" "d" "e"
# [[4]]
# Vertex sequence:
# [1] "(((a,b),c),d)" "e"
# [[5]]
# Vertex sequence:
# [1] "((((a,b),c),d),e)"
Related
I want to randomize/shuffle a vector. Some of the vector elements are identical. After shuffling, identical elements should have a minimum distance of three (i.e. two other elements should be between identical elements).
Consider the following example vector in R:
x <- rep(LETTERS[1:5], 3) # Create example vector
x
# [1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"
If I shuffle my vector using the sample function, some of the identical elements may be too close together. For instance, if I use the following R code, the element "C" appears directly after each other at positions 5 and 6:
set.seed(53135)
sample(x) # sample() function puts same elements too close
# [1] "B" "A" "E" "D" "C" "C" "E" "A" "B" "C" "D" "E" "A" "D" "B"
How could I ensure that identical elements have a minimum distance of three?
So basically we need to conditionally sample one element from the x vector that have not been selected in the min.dist-1 runs. Using purrr's reduce we can achieve this:
min.dist <- 2
reduce(integer(length(x)-1), ~c(.x, sample(x[!x %in% tail(.x, min.dist)], 1)), .init=sample(x,1))
[1] "A" "E" "D" "B" "A" "D" "E" "C" "D" "A" "C" "E" "B" "A" "E"
Bundled in a function
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), ~c(.x, sample(x[!x %in% tail(.x, min.dist)], 1)), .init=sample(x,1))
}
> shuffle(x, 3)
[1] "A" "C" "B" "D" "E" "A" "B" "C" "E" "D" "A" "B" "C" "E" "A"
> shuffle(x, 3)
[1] "A" "B" "D" "E" "C" "A" "B" "D" "E" "C" "A" "D" "E" "C" "A"
> shuffle(x, 4)
[1] "C" "E" "D" "A" "B" "C" "E" "D" "A" "B" "C" "E" "D" "A" "B"
> shuffle(x, 4)
[1] "A" "B" "D" "E" "C" "A" "B" "D" "E" "C" "A" "B" "D" "E" "C"
> shuffle(x, 2)
[1] "E" "A" "D" "E" "B" "D" "A" "E" "C" "D" "A" "E" "C" "A" "B"
> shuffle(x, 2)
[1] "B" "A" "D" "C" "B" "A" "E" "B" "A" "E" "B" "C" "D" "A" "E"
after #27Ï•9 comment:
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), ~ c(.x, sample(x[!x %in% tail(.x, min.dist) &( x %in% names(t <- table(x[x%in%.x]) > table(.x))[t] | !x %in% .x)], 1)), .init=sample(x,1))
}
> table(shuffle(rep(LETTERS[1:5], 3),2))
A B C D E
3 3 3 3 3
> table(shuffle(rep(LETTERS[1:5], 3),2))
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
UPDATE
After some trial and error, looking at the fact that not always you're gonna have enough elements to space out the min.dist I came up with a solution this code is the most explained from the ones above :
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), function(.x, ...){
# whether the value is in the tail of the aggregated vector
in.tail <- x %in% tail(.x, min.dist)
# whether a value still hasn't reached the max frequency
freq.got <- x %in% names(t<-table(x[x%in%.x]) > table(.x))[t]
# whether a value isn't in the aggregated vector
yet <- !x %in% .x
# the if is there basically to account for the cases when we don't have enough vars to space out the vectors
c(.x, if(any((!in.tail & freq.got) | yet )) sample(x[(!in.tail & freq.got) | yet ], 1) else x[which(freq.got)[1]] )
}, .init=sample(x,1))
}
now running the table(shuffle(rep(LETTERS[1:5], 3),2)) will always return 3 for all vars and we can say with some certainty that in the vector the variables are spaced with a minimum distance of 2. the only way to guarantee that no elements are duplicated is by using min.dist=length(unique(x))-1 otherwise there will be instances where at maximum r < min.dist elements are not min.dist distanced from their last occurrences, and if such elements exist they're going to be in the length(x) + 1 - 1:min.dist subset of the resulting vector.
Just to be completely certain using a loop to check whether tail of the output vector has unique values: (remove the print statement I used it just for demonstration purposes)
shuffler <- function(x, min.dist=2){
while(!length(unique(print(tail(l<-shuffle(x, min.dist=min.dist), min.dist+1))))==min.dist+1){}
l
}
table(print(shuffler(rep(LETTERS[1:5], 3),2)))
[1] "A" "B" "C" "E" "B" "C" "D" "A" "C" "D" "A" "E" "B" "D" "E"
A B C D E
3 3 3 3 3
table(print(shuffler(rep(LETTERS[1:5], 3),2)))
[1] "D" "C" "C"
[1] "C" "C" "E"
[1] "C" "A" "C"
[1] "D" "B" "D"
[1] "B" "E" "D"
[1] "C" "A" "E" "D" "A" "B" "C" "E" "A" "B" "D" "C" "B" "E" "D"
A B C D E
3 3 3 3 3
Update:
shuffler <- function(x, min.dist=2){
while(any(unlist(lapply(unique(tl<-tail(l<-shuffle(x, min.dist=min.dist), 2*min.dist)), function(x) diff(which(tl==x))<=min.dist)))){}
l
}
this new version does a rigorous test on whether the elements in the tail of the vector are min.distanced, the previous version works for min.dist=2, however this new version does better testing.
If your data is large, then it may be (way) faster to rely on probability to do that kind of task.
Here's an example:
prob_shuffler = function(x, min.dist = 2){
n = length(x)
res = sample(x)
OK = FALSE
# We loop until we have a solution
while(!OK){
OK = TRUE
for(i in 1:min.dist){
# We check if identical elements are 'i' steps away
pblm = res[1:(n-i)] == res[-(1:i)]
if(any(pblm)){
if(sum(pblm) >= (n - i)/2){
# back to square 1
res = sample(x)
} else {
# we pair each identical element with
# an extra one
extra = sample(which(!pblm), sum(pblm))
id_reshuffle = c(which(pblm), extra)
res[id_reshuffle] = sample(res[id_reshuffle])
}
# We recheck from the beginning
OK = FALSE
break
}
}
}
res
}
Even though the while loop looks scary, in practice convergence is fast. Of course, the lower the probability to have two characters at min.dist away, the faster the convergence.
The current solutions by #Abdessabour Mtk and #Carles Sans Fuentes work but, depending on the size of the input data, quickly become prohibitively slow. Here's a benchmark:
library(microbenchmark)
x = rep(c(letters, LETTERS), 10)
length(x)
#> [1] 520
microbenchmark(prob_shuffler(x, 1), shuffler_am(x, 1), shuffler_csf(x, 1), times = 10)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> prob_shuffler(x, 1) 87.001 111.501 155.071 131.801 192.401 264.401 10
#> shuffler_am(x, 1) 17218.100 18041.900 20324.301 18740.351 22296.301 26495.200 10
#> shuffler_csf(x, 1) 86771.401 88550.501 118185.581 95582.001 98781.601 341826.701 10
microbenchmark(prob_shuffler(x, 2), shuffler_am(x, 2), shuffler_csf(x, 2), times = 10)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> prob_shuffler(x, 2) 140.1 195.201 236.3312 245.252 263.202 354.101 10
#> shuffler_am(x, 2) 18886.2 19526.901 22967.6409 21021.151 26758.800 29133.400 10
#> shuffler_csf(x, 2) 86078.1 92209.901 97151.0609 97612.251 99850.101 107981.401 10
microbenchmark(prob_shuffler(x, 3), shuffler_am(x, 3), shuffler_csf(x, 3), times = 10)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> prob_shuffler(x, 3) 318.001 450.402 631.5312 573.352 782.2 1070.401 10
#> shuffler_am(x, 3) 19003.501 19622.300 23314.4808 20784.551 28281.5 32885.101 10
#> shuffler_csf(x, 3) 87692.701 96152.202 101233.5411 100925.201 108034.7 113814.901 10
We can remark two things: a) in all logic, the speed of prob_shuffler depends on min.dist while the other methods not so much, b) prob_shuffler is about 100-fold faster for just 520 observations (and it scales).
Of course if the probability to have two identical characters at min.dist away is extremely high, then the recursive methods should be faster. But in most practical cases, the probability method is faster.
I hope this answer works fine for you. It is done with base R, but it works. I leave the printing if you want to check line by line:
x <- rep(LETTERS[1:5], 3) # Create example vector
shuffle <- function(x, min_dist=3){
#init variables
result<-c() # result vector
count<-0
vec_use<-x
vec_keep<-c()
for(i in 1:length(x)){
# print(paste0("iteration =", i))
if (count>min_dist){
valback<-vec_keep[1]
# print(paste0("value to be returned:", valback))
ntimes_valback<-(table(vec_keep)[valback])
vec_use<- c(vec_use,rep(valback,ntimes_valback))
# print(paste0("vec_use after giving back valbak =", valback))
# print(paste0(vec_use,","))
vec_keep <- vec_keep[!vec_keep %in% valback]
# print(paste0("vec_keep after removing valback =", valback))
# print(paste0(vec_keep,","))
}
val<-sample(vec_use,1)
# print(paste0("val = ",val))#remove value
vec_keep<- c(vec_keep,x[x %in% val])
vec_keep<-vec_keep[1:(length(vec_keep)-1)]#removing 1 letter
# print(paste0("vec_keep ="))
# print(paste0(vec_keep,","))
vec_use <- vec_use[!vec_use %in% val]
# print(paste0("vec_use ="))
# print(paste0(vec_use,","))
result[i]<-val
count<-count+1
}
return(result)
}
shuffle(x)
"C" "D" "B" "E" "C" "A" "B" "D" "E" "A" "C" "D" "B" "E" "C"
How to shuffle values in a matrix or data frame so that each value only occurs once in each row/column in R?
For example, here is what I have:
A, A, A, A
B, B, B, B
C, C, C, C
D, D, D, D
E, E, E, E
F, F, F, F
G, G, G, G
What I Want:
A, B, C, D
B, C, D, E
C, D, E, F
D, E, F, G
E, F, G, A
F, G, A, B
G, A, B, C
I either need to shift the column up/down and wrap it to maintain my inputs or do it through some sort of shuffle function that shuffles the columns in a way that still ensures the rows do not repeat.
You can try outer + %% to produce circular shift
v <- LETTERS[1:4]
out <- matrix(
v[outer(seq_along(v) - 1, seq_along(v) - 1, `+`) %% length(v) + 1],
nrow = length(v)
)
such that
> out
[,1] [,2] [,3] [,4]
[1,] "A" "B" "C" "D"
[2,] "B" "C" "D" "A"
[3,] "C" "D" "A" "B"
[4,] "D" "A" "B" "C"
Update
If your matrix is not square, you can try the code like below
v <- LETTERS[1:7]
out <- t(
head(
matrix(
v[outer(seq_along(v) - 1, seq_along(v) - 1, `+`) %% length(v) + 1],
nrow = length(v)
), 4
)
)
which gives
> out
[,1] [,2] [,3] [,4]
[1,] "A" "B" "C" "D"
[2,] "B" "C" "D" "E"
[3,] "C" "D" "E" "F"
[4,] "D" "E" "F" "G"
[5,] "E" "F" "G" "A"
[6,] "F" "G" "A" "B"
[7,] "G" "A" "B" "C"
Using the variables alpha and key, encrypt ptext into a variable named ctext. Using substitution cipher
So I have a text file separated in a vector
ptext <- strsplit(ptext,split = "", fixed = TRUE)
ptext <- unlist(ptext)
I also created a key for this cipher
key <- "ZGYHXIWJVKULTMSARBQCPDOENF"
key <- unlist(strsplit(key,""))
and an Alphabet vector for the key
alpha <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
alpha <= toupper(alpha)
alpha <- unlist(strsplit(alpha,""))
Now my goal is to try to replace all the character in the ptext vector corresponding to the letters in the key in relation to alpha (Example: A in alpha in relation to Z in the key. So all A's in the text would be replaced by a Z)
I know I am supposed to match the alpha in key
cipher <- match(key,alpha)
Now my issue is, the ptext file is over 1000 characters in it. How would I be able to replace all the letters in that vector?
You could use chartr which will avoid splitting the string and pasting back.
ptext <- 'REQWDSFFFSLK'
alpha <- 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
key <- 'ZGYHXIWJVKULTMSARBQCPDOENF'
chartr(alpha, key, ptext)
#[1] "BXROHQIIIQLU"
Here, all R is replaced with B, E with X and so on for every character value.
Basically, you need to do what you are doing with the cipher but apply that to each letter of ptext. You can either leave as a vector of single strings or put back together as desired, for example using paste0 below.
set.seed(123)
ptext <- strsplit(LETTERS[sample(26, 100, T)],split = "", fixed = TRUE)
ptext <- unlist(ptext)
key <- "ZGYHXIWJVKULTMSARBQCPDOENF"
key <- unlist(strsplit(key,""))
alpha <- unlist(strsplit(LETTERS,""))
encoded <- sapply(ptext, function(x) key[match(x, alpha)])
encoded
#> O S N C J R V K E T N V Y Z E S Y Y I C
#> "S" "Q" "M" "Y" "K" "B" "D" "U" "X" "C" "M" "D" "N" "F" "X" "Q" "N" "N" "V" "Y"
#> H Z G J I S D N Q K G U L O J M G I I J
#> "J" "F" "W" "K" "V" "Q" "H" "M" "R" "U" "W" "P" "L" "S" "K" "T" "W" "V" "V" "K"
#> W U G U F Y B E H L M R A Y Y F U O I O
#> "O" "P" "W" "P" "I" "N" "G" "X" "J" "L" "T" "B" "Z" "N" "N" "I" "P" "S" "V" "S"
#> Z P T F K H V V G P Q V R Q B D M E V S
#> "F" "A" "C" "I" "U" "J" "D" "D" "W" "A" "R" "D" "B" "R" "G" "H" "T" "X" "D" "Q"
#> Y T V Y N Y W C H P L Y N C N G C W V Z
#> "N" "C" "D" "N" "M" "N" "O" "Y" "J" "A" "L" "N" "M" "Y" "M" "W" "Y" "O" "D" "F"
paste0(encoded, collapse = "")
#> [1] "SQMYKBDUXCMDNFXQNNVYJFWKVQHMRUWPLSKTWVVKOPWPINGXJLTBZNNIPSVSFACIUJDDWARDBRGHTXDQNCDNMNOYJALNMYMWYODF"
I have set $S$ with the cardinality of $M$. I would like to create all powerset of $S$ with at most cardinality of $K$ where $K \le M$.
I used $R$ to create powersets, but it does not provide an option to constrain it to the mentioned case. Since size of $S$ is really large (500), for my problem, I just need to compute all subsets with cardinality at most 5.
Can someone help me to do this in R?
With |S| = 500 k won't be able to be very large. For k = 0, 1, 2, 3, 4, 5 this is how many subsets there are having size of at most k:
cumsum(sapply(0:5, choose, n = 500))
## [1] 1 501 125251 20833751 2593864876 257838552476
Now turning to the code note that combn(x = S, m = i, simplify = FALSE) gives all subsets of size i so:
# test data
S <- head(letters, 4)
k <- 2
subsets_k <- do.call("c", lapply(0:k, combn, x = S, simplify = FALSE))
giving all subsets of 0, 1 or k=2 elements:
> subsets_k
[[1]]
character(0)
[[2]]
[1] "a"
[[3]]
[1] "b"
[[4]]
[1] "c"
[[5]]
[1] "d"
[[6]]
[1] "a" "b"
[[7]]
[1] "a" "c"
[[8]]
[1] "a" "d"
[[9]]
[1] "b" "c"
[[10]]
[1] "b" "d"
[[11]]
[1] "c" "d"
or we can represent these as a character vector of comma-separated elements:
sapply(subsets_k, toString)
## [1] "" "a" "b" "c" "d" "a, b" "a, c" "a, d" "b, c" "b, d" "c, d"
or directly:
unlist(sapply(0:k, function(i) combn(S, i, FUN = toString)))
## [1] "" "a" "b" "c" "d" "a, b" "a, c" "a, d" "b, c" "b, d" "c, d"
My apologies for the somewhat confusing title (any suggestion for improvement are welcome)..
Suppose I have a list which contains several (e.g. four) lists in which I would like to store 20 objects later on:
mylist <- vector(mode="list",length=4)
names(mylist) <- c("One","Two","Three","Four")
mylist$One <- mylist$Two <- mylist$Three <- mylist$Four <- vector(mode="list",
length=20)
I would like to define the names of those objects beforehand. Of course, I can do that as following:
names(mylist$One) <- c("A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T")
names(mylist$Two) <- names(mylist$Three) <- names(mylist$Four) <- names(mylist$One)
But if the number of the lists would increase (as is the case in my actual data), this becomes rather cumbersome, so I was trying to do this with a function such as lapply :
mylist <- lapply(mylist,FUN=function(x) {names(x) <-
c("A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T")})
This, however, does not give me the same result, but I can not seem to figure out what I am overlooking here. Any suggestions?
Thanks!
You need to return a value in your lapply call:
mylist <- lapply(mylist,FUN=function(x) {names(x) <-
c("A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T")
x ## <- note the x here; you could also use return(x)
})
mylist
# $One
# A B C D E F G H I J K L M N O P Q R S T
# "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
#
# $Two
# A B C D E F G H I J K L M N O P Q R S T
# "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
#
# $Three
# A B C D E F G H I J K L M N O P Q R S T
# "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
#
# $Four
# A B C D E F G H I J K L M N O P Q R S T
# "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
This is my implementation, which I think it produces the results you are expecting
mylist <- vector(mode="list",length=4)
names(mylist) <- c("One","Two","Three","Four")
mylist$One <- mylist$Two <- mylist$Three <- mylist$Four <- vector(mode="list",length=20)
renameList <- function(mylist,k){
names(mylist) <- LETTERS[1:k]
return(mylist)
}
mylist2 <- lapply(mylist, function(x) renameList(x,20))
# > str(mylist2)
# List of 4
# $ One :List of 20
# ..$ A: NULL
# ..$ B: NULL
# ..$ C: NULL