How to convert frequency into text by using R?

How to convert frequency into text by using R? - r

I have dataframe like this (ID, Frequency A B C D E)
ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1
I want to convert this dataframe into test based document like this (ID and their frequency ABCDE as words in a single column). Then I may use LDA algorithm to identify hot topics for each ID.
ID Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"

We can use data.table
library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Or a base R option is split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
If we want to write the 'lst' to csv file, one option is convert the list to data.frame by appending NA at the end to make the length equal while converting to data.frame (as data.frame is a list with equal length (columns))
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
Or use a convenient function from stringi
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)
and then use the write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)

You can use apply and rep like so:
apply(df[-1], 1, function(i) rep(names(df)[-1], i))
For each row, apply feeds the rep function the number of times to repeat each variable name. This returns a list of vectors:
[[1]]
[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
[[2]]
[1] "A" "A" "A" "B" "B" "C" "C" "D"
[[3]]
[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Where each list element is a row of your data.frame.
data
df <- read.table(header=T, text="ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1")

Related

circular permutation of similar objects

I need to an R code for circular permutation of similar objects which defines this code exactly.
The number of circular permutations that can be formed using 'n' objects out of which 'p' are identical and of one kind and 'q' are identical and of another kind.
(n-1)!/p!q!
This is the best code which I found but it is not exactly what I want
library(arrangements)
permutations(x = c("A","B","C"), freq = c(2,1,1))
output:
[,1] [,2] [,3] [,4]
[1,] "A" "A" "B" "C"
[2,] "A" "A" "C" "B"
[3,] "A" "B" "A" "C"
[4,] "A" "B" "C" "A"
[5,] "A" "C" "A" "B"
[6,] "A" "C" "B" "A"
[7,] "B" "A" "A" "C"
[8,] "B" "A" "C" "A"
[9,] "B" "C" "A" "A"
[10,] "C" "A" "A" "B"
[11,] "C" "A" "B" "A"
[12,] "C" "B" "A" "A"
I do not want "A" "A" are beside each other.

It turns out that a recursive function works well for this problem. The function takes the journey so far, figures out which remaining towns are possible to visit next, and then calls itself for each of these. If there are no remaining towns it reports the route.
# recursive function to visit remaining towns
journey <- function(remaining, visited){
# possible towns to visit next
possible <- setdiff(remaining, tail(visited, 1))
if (length(possible)==0){
if (length(remaining)==0){
# report and store journey
print(visited)
routei <<- routei + 1
routes[[routei]] <<- visited
} else {
# route failed to visit all towns
}
} else {
# loop through options
for (i in possible){
# continue journey
journey(remaining[-match(i, remaining)], c(visited, i))
}
}
}
remaining <- c("A", "A", "B", "B", "C")
visited <- character(0)
routes <- vector("list", length(remaining)^2)
routei <- 0
journey(remaining, visited)
#> [1] "A" "B" "A" "B" "C"
#> [1] "A" "B" "A" "C" "B"
#> [1] "A" "B" "C" "A" "B"
#> [1] "A" "B" "C" "B" "A"
#> [1] "A" "C" "B" "A" "B"
#> [1] "B" "A" "B" "A" "C"
#> [1] "B" "A" "B" "C" "A"
#> [1] "B" "A" "C" "A" "B"
#> [1] "B" "A" "C" "B" "A"
#> [1] "B" "C" "A" "B" "A"
#> [1] "C" "A" "B" "A" "B"
#> [1] "C" "B" "A" "B" "A"
Created on 2019-07-22 by the reprex package (v0.3.0)

What is the fastest way to reduce elements of a list by frequency?

Suppose that I have a list similar to this one:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
[[1]]
[1] "b"
[[2]]
[1] "d" "c"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "d" "d" "b" "c"
[[5]]
[1] "d" "d" "c" "c" "b"
[[6]]
[1] "b" "d" "b" "d" "c" "c"
[[7]]
[1] "a" "b" "d" "d" "b" "a" "d"
I would like to have vectors of length one given by the element of higher frequency in the list. Notice that is possible to have vectors of length > 1 if there are no duplicates. The frequency table is like this:
table(unlist(out))[order(table(unlist(out)), decreasing = T)]
b c d a
16 14 13 12
The outcome of the example is something like this:
list("b", "c", "b", "b", "b", "b", "b")
REMARK
It is possible to have vectors of length > 1 if there are no duplicates.
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
length(out)
[1] 10
out[[length(out)+1]] <- c("L", "K")
out
[[1]]
[1] "c"
[[2]]
[1] "d" "a"
[[3]]
[1] "c" "b" "a"
[[4]]
[1] "b" "c" "b" "c"
[[5]]
[1] "a" "a" "d" "c" "d"
[[6]]
[1] "d" "b" "d" "d" "d" "a"
[[7]]
[1] "d" "b" "c" "c" "d" "c" "a"
[[8]]
[1] "d" "a" "d" "b" "d" "a" "b" "d"
[[9]]
[1] "a" "b" "b" "b" "c" "c" "a" "c" "d"
[[10]]
[1] "d" "d" "d" "a" "d" "d" "c" "c" "a" "c"
[[11]]
[1] "L" "K"
Expected outcome:
list("c", "d", "c", "c", "d", "d", "d", "d", "d", "d", c("L", "K"))

I believe that this should work for what you are looking for.
# get counts for entire list and order them
myRanks <- sort(table(unlist(out)), decreasing=TRUE)
This produces
myRanks
b c d a
10 9 5 4
# calculate if most popular, then second most popular, ... item shows up for each list item
sapply(out, function(i) names(myRanks)[min(match(i, names(myRanks)))])
[1] "b" "b" "b" "c" "b" "b" "b"
Here, sapply runs through each list item and returns a vector. It applies a function that selects the name of the first element (via min) of the myRanks table that appears in the list element, using match.
In the case of multiple elements having the same count (duplicates) in the myRanks table, the following code should to return a list of the top observations per list item:
sapply(out,
function(i) {
intersect(names(myRanks)[myRanks == max(unique(myRanks[match(i, names(myRanks))]))],
i)})
Here, the names of myRanks that have the same value as the value in the list item with the highest value in myRanks are intersected with the names present in the list item in order to only return values in both sets.

This should work:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
sapply(out, function(x) intersect(names(tbl), x)[1])
# [1] "b" "b" "b" "c" "b" "b" "b"
[EDIT]
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out[[length(out)+1]] <- c("L", "K")
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
#[[8]]
#[1] "L" "K"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
#tbl
#b c d a K L
#10 9 5 4 1 1
lapply(out, function(x) names(tbl[tbl==max(tbl[names(tbl) %in% intersect(names(tbl), x)])]))
#[[1]]
#[1] "b"
#[[2]]
#[1] "b"
#[[3]]
#[1] "b"
#[[4]]
#[1] "c"
#[[5]]
#[1] "b"
#[[6]]
#[1] "b"
#[[7]]
#[1] "b"
#[[8]]
#[1] "K" "L"

data.frame to list in R

I have data like this:
ID = c(rep("ID1",3), rep("ID2",2), "ID3", rep("ID4",2))
item = c("a","b","c","a","c","a","b","a")
df = data.frame(ID,item)
ID1 a
ID1 b
ID1 c
ID2 a
ID2 c
ID3 a
ID4 b
ID4 a
and I would need it as a list like this to be transformed to "transactions":
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "c"
[[3]]
[1] "a"
[[4]]
[1] "b" "a"
I tried:
lapply(split(item, ID), function(x) as.list(x))
but the items are still on separate "rows" and not one after the other.
Any ideas on how to accomplish the above format?

Use unstack:
df <- data.frame(ID,item)
unstack(df, item~ID)
# $ID1
# [1] "a" "b" "c"
#
# $ID2
# [1] "a" "c"
#
# $ID3
# [1] "a"
#
# $ID4
# [1] "b" "a"

Based on the expected output, you don't need to use as.list
setNames(split(as.character(df1$item),df1$ID) , NULL)
#[[1]]
#[1] "a" "b" "c"
#[[2]]
#[1] "a" "c"
#[[3]]
#[1] "a"
#[[4]]
#[1] "b" "a"

Using your approach and make it working:
> lapply(split(df, df$ID), function(u) u$item)
#$ID1
#[1] "a" "b" "c"
#$ID2
#[1] "a" "c"
#$ID3
#[1] "a"
#$ID4
#[1] "b" "a"

R: merge elements of a list according to a vector of factors

So, I have a list, that each elements is a vector. I have another vector of factors which I want to use to merge some of my list elements. As an example, here is some data:
> l <- list("0"=c("a", "b", "c", "a", "a"), "1"=c("c", "b", "c", "a", "c", "c", "c"), "2"=c("b", "b", "b"), "3"=c("d", "d", "a", "b", "d"))
> l
$`0`
[1] "a" "b" "c" "a" "a"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c"
$`2`
[1] "b" "b" "b"
$`3`
[1] "d" "d" "a" "b" "d"
> f <- factor(c(1,2,2,1))
> f
[1] 1 2 2 1
Levels: 1 2
so according to this factor, I want to merge elements (1,4) and (2,3) since they have the same factor labels. My final list should look like this:
list.final:
$`0`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
So, only two elements, thereby merging elements 1,4 and 2,3 from the original list. Is there any way to do this using some apply functions?

tapply(l, f, FUN=unlist,use.names=F)
#$`1`
#[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
#$`2`
#[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"

You can split the list on the factors and then recursively call c:
> lapply(split(l, f), function(x) unname(c(x, recursive = TRUE)))
$`1`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`2`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"
in a similar manner you can use unlist
> lapply(split(l, f), function(x) unname(unlist(x)))
$`1`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`2`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"

Since you only have two levels, you could use the list index.
> L <- unname(l)
> list(`0` = unlist(L[f == 1]), `1` = unlist(L[f != 1]))
$`0`
[1] "a" "b" "c" "a" "a" "d" "d" "a" "b" "d"
$`1`
[1] "c" "b" "c" "a" "c" "c" "c" "b" "b" "b"

Generating all distinct permutations of a list in R

I'm trying to create a list of permutations of a list, such that, for example, perms(list("a", "b", "c")) returns
list(list("a", "b", "c"), list("a", "c", "b"), list("b", "a", "c"),
list("b", "c", "a"), list("c", "a", "b"), list("c", "b", "a"))
I'm not sure how to proceed, any help would be greatly appreciated.

A while back I had to do this in base R without loading any packages.
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
Usage:
> matrix(letters[permutations(3)],ncol=3)
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"

combinat::permn will do that work:
> library(combinat)
> permn(letters[1:3])
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "c" "b"
[[3]]
[1] "c" "a" "b"
[[4]]
[1] "c" "b" "a"
[[5]]
[1] "b" "c" "a"
[[6]]
[1] "b" "a" "c"
Note that calculation is huge if the element is large.

base R can also provide the answer:
all <- expand.grid(p1 = letters[1:3], p2 = letters[1:3], p3 = letters[1:3], stringsAsFactors = FALSE)
perms <- all[apply(all, 1, function(x) {length(unique(x)) == 3}),]

You can try permutations() from the gtools package, but unlike permn() from combinat, it doesn't output a list:
> library(gtools)
> permutations(3, 3, letters[1:3])
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"

A solution in base R, no dependencies on other packages:
> getPermutations <- function(x) {
if (length(x) == 1) {
return(x)
}
else {
res <- matrix(nrow = 0, ncol = length(x))
for (i in seq_along(x)) {
res <- rbind(res, cbind(x[i], Recall(x[-i])))
}
return(res)
}
}
> getPermutations(letters[1:3])
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"
I hope this helps.

# Another recursive implementation
# for those who like to roll their own, no package required
permutations <- function( x, prefix = c() )
{
if(length(x) == 0 ) return(prefix)
do.call(rbind, sapply(1:length(x), FUN = function(idx) permutations( x[-idx], c( prefix, x[idx])), simplify = FALSE))
}
permutations(letters[1:3])
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "a" "c" "b"
#[3,] "b" "a" "c"
#[4,] "b" "c" "a"
#[5,] "c" "a" "b"
#[6,] "c" "b" "a"

Try:
> a = letters[1:3]
> eg = expand.grid(a,a,a)
> eg[!(eg$Var1==eg$Var2 | eg$Var2==eg$Var3 | eg$Var1==eg$Var3),]
Var1 Var2 Var3
6 c b a
8 b c a
12 c a b
16 a c b
20 b a c
22 a b c
As suggested by #Adrian in comments, last line can be replaced by:
eg[apply(eg, 1, anyDuplicated) == 0, ]

A fun solution "probabilistic" using sample for base R:
elements <- c("a", "b", "c")
k <- length(elements)
res=unique(t(sapply(1:200, function(x) sample(elements, k))))
# below, check you have all the permutations you need (if not, try again)
nrow(res) == factorial(k)
res
basically you call many random samples, hoping to get them all, and you unique them.

We can use base function combn with a little modifcation:
combn_n <- function(x) {
m <- length(x) - 1 # number of elements to choose: n-1
xr <- rev(x) # reversed x
part_1 <- rbind(combn(x, m), xr, deparse.level = 0)
part_2 <- rbind(combn(xr, m), x, deparse.level = 0)
cbind(part_1, part_2)
}
combn_n(letters[1:3])
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "b" "c" "c" "b"
[2,] "b" "c" "c" "b" "a" "a"
[3,] "c" "b" "a" "a" "b" "c"

Behold, the purrr 🐾 solution:
> map(1:3, ~ c('a', 'b', 'c')) %>%
cross() %>%
keep(~ length(unique(.x)) == 3) %>%
map(unlist)
#> [[1]]
#> [1] "c" "b" "a"
#>
#> [[2]]
#> [1] "b" "c" "a"
#>
#> [[3]]
#> [1] "c" "a" "b"
#>
#> [[4]]
#> [1] "a" "c" "b"
#>
#> [[5]]
#> [1] "b" "a" "c"
#>
#> [[6]]
#> [1] "a" "b" "c"

In case this helps, there is the "arrangements" package, that allows you to simply do :
> abc = letters[1:3]
> permutations(abc)
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "c" "b"
[3,] "b" "a" "c"
[4,] "b" "c" "a"
[5,] "c" "a" "b"
[6,] "c" "b" "a"

A generic version of rnso's answer is:
get_perms <- function(x){
stopifnot(is.atomic(x)) # for the matrix call to make sense
out <- as.matrix(expand.grid(
replicate(length(x), x, simplify = FALSE), stringsAsFactors = FALSE))
out[apply(out,1, anyDuplicated) == 0, ]
}
Here are two examples:
get_perms(letters[1:3])
#R> Var1 Var2 Var3
#R> [1,] "c" "b" "a"
#R> [2,] "b" "c" "a"
#R> [3,] "c" "a" "b"
#R> [4,] "a" "c" "b"
#R> [5,] "b" "a" "c"
#R> [6,] "a" "b" "c"
get_perms(letters[1:4])
#R> Var1 Var2 Var3 Var4
#R> [1,] "d" "c" "b" "a"
#R> [2,] "c" "d" "b" "a"
#R> [3,] "d" "b" "c" "a"
#R> [4,] "b" "d" "c" "a"
#R> [5,] "c" "b" "d" "a"
#R> [6,] "b" "c" "d" "a"
#R> [7,] "d" "c" "a" "b"
#R> [8,] "c" "d" "a" "b"
#R> [9,] "d" "a" "c" "b"
#R> [10,] "a" "d" "c" "b"
#R> [11,] "c" "a" "d" "b"
#R> [12,] "a" "c" "d" "b"
#R> [13,] "d" "b" "a" "c"
#R> [14,] "b" "d" "a" "c"
#R> [15,] "d" "a" "b" "c"
#R> [16,] "a" "d" "b" "c"
#R> [17,] "b" "a" "d" "c"
#R> [18,] "a" "b" "d" "c"
#R> [19,] "c" "b" "a" "d"
#R> [20,] "b" "c" "a" "d"
#R> [21,] "c" "a" "b" "d"
#R> [22,] "a" "c" "b" "d"
#R> [23,] "b" "a" "c" "d"
#R> [24,] "a" "b" "c" "d"
One can also slightly alter Rick's answer by using lapply, only doing a single rbind, and reduce the number of [s]/[l]apply calls:
permutations <- function(x, prefix = c()){
if(length(x) == 1) # was zero before
return(list(c(prefix, x)))
out <- do.call(c, lapply(1:length(x), function(idx)
permutations(x[-idx], c(prefix, x[idx]))))
if(length(prefix) > 0L)
return(out)
do.call(rbind, out)
}

What about
pmsa <- function(l) {
pms <- function(n) if(n==1) return(list(1)) else unlist(lapply(pms(n-1),function(v) lapply(0:(n-1),function(k) append(v,n,k))),recursive = F)
lapply(pms(length(l)),function(.) l[.])
}
This gives a list. Then
pmsa(letters[1:3])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to convert frequency into text by using R? - r

Related

circular permutation of similar objects

What is the fastest way to reduce elements of a list by frequency?

data.frame to list in R

R: merge elements of a list according to a vector of factors

Generating all distinct permutations of a list in R

Categories

Resources