Suppose that I have a list similar to this one:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
[[1]]
[1] "b"
[[2]]
[1] "d" "c"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "d" "d" "b" "c"
[[5]]
[1] "d" "d" "c" "c" "b"
[[6]]
[1] "b" "d" "b" "d" "c" "c"
[[7]]
[1] "a" "b" "d" "d" "b" "a" "d"
I would like to have vectors of length one given by the element of higher frequency in the list. Notice that is possible to have vectors of length > 1 if there are no duplicates. The frequency table is like this:
table(unlist(out))[order(table(unlist(out)), decreasing = T)]
b c d a
16 14 13 12
The outcome of the example is something like this:
list("b", "c", "b", "b", "b", "b", "b")
REMARK
It is possible to have vectors of length > 1 if there are no duplicates.
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
length(out)
[1] 10
out[[length(out)+1]] <- c("L", "K")
out
[[1]]
[1] "c"
[[2]]
[1] "d" "a"
[[3]]
[1] "c" "b" "a"
[[4]]
[1] "b" "c" "b" "c"
[[5]]
[1] "a" "a" "d" "c" "d"
[[6]]
[1] "d" "b" "d" "d" "d" "a"
[[7]]
[1] "d" "b" "c" "c" "d" "c" "a"
[[8]]
[1] "d" "a" "d" "b" "d" "a" "b" "d"
[[9]]
[1] "a" "b" "b" "b" "c" "c" "a" "c" "d"
[[10]]
[1] "d" "d" "d" "a" "d" "d" "c" "c" "a" "c"
[[11]]
[1] "L" "K"
Expected outcome:
list("c", "d", "c", "c", "d", "d", "d", "d", "d", "d", c("L", "K"))
I believe that this should work for what you are looking for.
# get counts for entire list and order them
myRanks <- sort(table(unlist(out)), decreasing=TRUE)
This produces
myRanks
b c d a
10 9 5 4
# calculate if most popular, then second most popular, ... item shows up for each list item
sapply(out, function(i) names(myRanks)[min(match(i, names(myRanks)))])
[1] "b" "b" "b" "c" "b" "b" "b"
Here, sapply runs through each list item and returns a vector. It applies a function that selects the name of the first element (via min) of the myRanks table that appears in the list element, using match.
In the case of multiple elements having the same count (duplicates) in the myRanks table, the following code should to return a list of the top observations per list item:
sapply(out,
function(i) {
intersect(names(myRanks)[myRanks == max(unique(myRanks[match(i, names(myRanks))]))],
i)})
Here, the names of myRanks that have the same value as the value in the list item with the highest value in myRanks are intersected with the names present in the list item in order to only return values in both sets.
This should work:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
sapply(out, function(x) intersect(names(tbl), x)[1])
# [1] "b" "b" "b" "c" "b" "b" "b"
[EDIT]
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out[[length(out)+1]] <- c("L", "K")
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
#[[8]]
#[1] "L" "K"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
#tbl
#b c d a K L
#10 9 5 4 1 1
lapply(out, function(x) names(tbl[tbl==max(tbl[names(tbl) %in% intersect(names(tbl), x)])]))
#[[1]]
#[1] "b"
#[[2]]
#[1] "b"
#[[3]]
#[1] "b"
#[[4]]
#[1] "c"
#[[5]]
#[1] "b"
#[[6]]
#[1] "b"
#[[7]]
#[1] "b"
#[[8]]
#[1] "K" "L"
I have dataframe like this (ID, Frequency A B C D E)
ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1
I want to convert this dataframe into test based document like this (ID and their frequency ABCDE as words in a single column). Then I may use LDA algorithm to identify hot topics for each ID.
ID Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"
We can use data.table
library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Or a base R option is split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
If we want to write the 'lst' to csv file, one option is convert the list to data.frame by appending NA at the end to make the length equal while converting to data.frame (as data.frame is a list with equal length (columns))
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
Or use a convenient function from stringi
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)
and then use the write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)
You can use apply and rep like so:
apply(df[-1], 1, function(i) rep(names(df)[-1], i))
For each row, apply feeds the rep function the number of times to repeat each variable name. This returns a list of vectors:
[[1]]
[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
[[2]]
[1] "A" "A" "A" "B" "B" "C" "C" "D"
[[3]]
[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Where each list element is a row of your data.frame.
data
df <- read.table(header=T, text="ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1")
I found it hard to formulate the question but I would like to find a clever way (not using loop) to get the following result:
> my.vector = letters[1:6]
> print(my.vector)
[1] "a" "b" "c" "d" "e" "f"
>
> my.list = (rep(list(NA),6))
> for (i in 1:length(my.vector)){
+ x = my.vector[1:i]
+ my.list[[i]] = x
+ }
> print(my.list)
[[1]]
[1] "a"
[[2]]
[1] "a" "b"
[[3]]
[1] "a" "b" "c"
[[4]]
[1] "a" "b" "c" "d"
[[5]]
[1] "a" "b" "c" "d" "e"
[[6]]
[1] "a" "b" "c" "d" "e" "f"
Thanks in advance,
Gabriel.
You can do:
lapply(seq_along(my.vector), head, x = my.vector)
Here's an approach (more verbose than #akrun's, but not dependent on the actual values in the original vector).
split(my.vector[sequence(seq_along(my.vector))],
rep(seq_along(my.vector), seq_along(my.vector)))
## $`1`
## [1] "a"
##
## $`2`
## [1] "a" "b"
##
## $`3`
## [1] "a" "b" "c"
##
## $`4`
## [1] "a" "b" "c" "d"
##
## $`5`
## [1] "a" "b" "c" "d" "e"
##
## $`6`
## [1] "a" "b" "c" "d" "e" "f"
##
If you wanted a matrix instead of a list, you can try:
> x <- t(replicate(length(my.vector), my.vector))
> x[upper.tri(x)] <- ""
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "" "" "" "" ""
[2,] "a" "b" "" "" "" ""
[3,] "a" "b" "c" "" "" ""
[4,] "a" "b" "c" "d" "" ""
[5,] "a" "b" "c" "d" "e" ""
[6,] "a" "b" "c" "d" "e" "f"
We can use
v1 <- my.vector[sequence(seq_along(my.vector))]
split(v1, cumsum(v1=='a'))
So, I have a list, that each elements is a vector. I have another vector of factors which I want to use to merge some of my list elements. As an example, here is some data:
> l <- list("0"=c("a", "b", "c", "a", "a"), "1"=c("c", "b", "c", "a", "c", "c", "c"), "2"=c("b", "b", "b"), "3"=c("d", "d", "a", "b", "d"))
> l
$`0`
[1] "a" "b" "c" "a" "a"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c"
$`2`
[1] "b" "b" "b"
$`3`
[1] "d" "d" "a" "b" "d"
> f <- factor(c(1,2,2,1))
> f
[1] 1 2 2 1
Levels: 1 2
so according to this factor, I want to merge elements (1,4) and (2,3) since they have the same factor labels. My final list should look like this:
list.final:
$`0`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
So, only two elements, thereby merging elements 1,4 and 2,3 from the original list. Is there any way to do this using some apply functions?
tapply(l, f, FUN=unlist,use.names=F)
#$`1`
#[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
#$`2`
#[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
You can split the list on the factors and then recursively call c:
> lapply(split(l, f), function(x) unname(c(x, recursive = TRUE)))
$`1`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`2`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
in a similar manner you can use unlist
> lapply(split(l, f), function(x) unname(unlist(x)))
$`1`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`2`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
Since you only have two levels, you could use the list index.
> L <- unname(l)
> list(`0` = unlist(L[f == 1]), `1` = unlist(L[f != 1]))
$`0`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
I'm trying to create a list of permutations of a list, such that, for example, perms(list("a", "b", "c")) returns
list(list("a", "b", "c"), list("a", "c", "b"), list("b", "a", "c"),
list("b", "c", "a"), list("c", "a", "b"), list("c", "b", "a"))
I'm not sure how to proceed, any help would be greatly appreciated.
A while back I had to do this in base R without loading any packages.
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
Usage:
> matrix(letters[permutations(3)],ncol=3)
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"
combinat::permn will do that work:
> library(combinat)
> permn(letters[1:3])
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "c" "b"
[[3]]
[1] "c" "a" "b"
[[4]]
[1] "c" "b" "a"
[[5]]
[1] "b" "c" "a"
[[6]]
[1] "b" "a" "c"
Note that calculation is huge if the element is large.
base R can also provide the answer:
all <- expand.grid(p1 = letters[1:3], p2 = letters[1:3], p3 = letters[1:3], stringsAsFactors = FALSE)
perms <- all[apply(all, 1, function(x) {length(unique(x)) == 3}),]
You can try permutations() from the gtools package, but unlike permn() from combinat, it doesn't output a list:
> library(gtools)
> permutations(3, 3, letters[1:3])
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"
A solution in base R, no dependencies on other packages:
> getPermutations <- function(x) {
if (length(x) == 1) {
return(x)
}
else {
res <- matrix(nrow = 0, ncol = length(x))
for (i in seq_along(x)) {
res <- rbind(res, cbind(x[i], Recall(x[-i])))
}
return(res)
}
}
> getPermutations(letters[1:3])
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"
I hope this helps.
# Another recursive implementation
# for those who like to roll their own, no package required
permutations <- function( x, prefix = c() )
{
if(length(x) == 0 ) return(prefix)
do.call(rbind, sapply(1:length(x), FUN = function(idx) permutations( x[-idx], c( prefix, x[idx])), simplify = FALSE))
}
permutations(letters[1:3])
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "a" "c" "b"
#[3,] "b" "a" "c"
#[4,] "b" "c" "a"
#[5,] "c" "a" "b"
#[6,] "c" "b" "a"
Try:
> a = letters[1:3]
> eg = expand.grid(a,a,a)
> eg[!(eg$Var1==eg$Var2 | eg$Var2==eg$Var3 | eg$Var1==eg$Var3),]
Var1 Var2 Var3
6 c b a
8 b c a
12 c a b
16 a c b
20 b a c
22 a b c
As suggested by #Adrian in comments, last line can be replaced by:
eg[apply(eg, 1, anyDuplicated) == 0, ]
A fun solution "probabilistic" using sample for base R:
elements <- c("a", "b", "c")
k <- length(elements)
res=unique(t(sapply(1:200, function(x) sample(elements, k))))
# below, check you have all the permutations you need (if not, try again)
nrow(res) == factorial(k)
res
basically you call many random samples, hoping to get them all, and you unique them.
We can use base function combn with a little modifcation:
combn_n <- function(x) {
m <- length(x) - 1 # number of elements to choose: n-1
xr <- rev(x) # reversed x
part_1 <- rbind(combn(x, m), xr, deparse.level = 0)
part_2 <- rbind(combn(xr, m), x, deparse.level = 0)
cbind(part_1, part_2)
}
combn_n(letters[1:3])
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "b" "c" "c" "b"
[2,] "b" "c" "c" "b" "a" "a"
[3,] "c" "b" "a" "a" "b" "c"
Behold, the purrr 🐾 solution:
> map(1:3, ~ c('a', 'b', 'c')) %>%
cross() %>%
keep(~ length(unique(.x)) == 3) %>%
map(unlist)
#> [[1]]
#> [1] "c" "b" "a"
#>
#> [[2]]
#> [1] "b" "c" "a"
#>
#> [[3]]
#> [1] "c" "a" "b"
#>
#> [[4]]
#> [1] "a" "c" "b"
#>
#> [[5]]
#> [1] "b" "a" "c"
#>
#> [[6]]
#> [1] "a" "b" "c"
In case this helps, there is the "arrangements" package, that allows you to simply do :
> abc = letters[1:3]
> permutations(abc)
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"
A generic version of rnso's answer is:
get_perms <- function(x){
stopifnot(is.atomic(x)) # for the matrix call to make sense
out <- as.matrix(expand.grid(
replicate(length(x), x, simplify = FALSE), stringsAsFactors = FALSE))
out[apply(out,1, anyDuplicated) == 0, ]
}
Here are two examples:
get_perms(letters[1:3])
#R> Var1 Var2 Var3
#R> [1,] "c" "b" "a"
#R> [2,] "b" "c" "a"
#R> [3,] "c" "a" "b"
#R> [4,] "a" "c" "b"
#R> [5,] "b" "a" "c"
#R> [6,] "a" "b" "c"
get_perms(letters[1:4])
#R> Var1 Var2 Var3 Var4
#R> [1,] "d" "c" "b" "a"
#R> [2,] "c" "d" "b" "a"
#R> [3,] "d" "b" "c" "a"
#R> [4,] "b" "d" "c" "a"
#R> [5,] "c" "b" "d" "a"
#R> [6,] "b" "c" "d" "a"
#R> [7,] "d" "c" "a" "b"
#R> [8,] "c" "d" "a" "b"
#R> [9,] "d" "a" "c" "b"
#R> [10,] "a" "d" "c" "b"
#R> [11,] "c" "a" "d" "b"
#R> [12,] "a" "c" "d" "b"
#R> [13,] "d" "b" "a" "c"
#R> [14,] "b" "d" "a" "c"
#R> [15,] "d" "a" "b" "c"
#R> [16,] "a" "d" "b" "c"
#R> [17,] "b" "a" "d" "c"
#R> [18,] "a" "b" "d" "c"
#R> [19,] "c" "b" "a" "d"
#R> [20,] "b" "c" "a" "d"
#R> [21,] "c" "a" "b" "d"
#R> [22,] "a" "c" "b" "d"
#R> [23,] "b" "a" "c" "d"
#R> [24,] "a" "b" "c" "d"
One can also slightly alter Rick's answer by using lapply, only doing a single rbind, and reduce the number of [s]/[l]apply calls:
permutations <- function(x, prefix = c()){
if(length(x) == 1) # was zero before
return(list(c(prefix, x)))
out <- do.call(c, lapply(1:length(x), function(idx)
permutations(x[-idx], c(prefix, x[idx]))))
if(length(prefix) > 0L)
return(out)
do.call(rbind, out)
}
What about
pmsa <- function(l) {
pms <- function(n) if(n==1) return(list(1)) else unlist(lapply(pms(n-1),function(v) lapply(0:(n-1),function(k) append(v,n,k))),recursive = F)
lapply(pms(length(l)),function(.) l[.])
}
This gives a list. Then
pmsa(letters[1:3])