I have data like this:
ID = c(rep("ID1",3), rep("ID2",2), "ID3", rep("ID4",2))
item = c("a","b","c","a","c","a","b","a")
df = data.frame(ID,item)
ID1 a
ID1 b
ID1 c
ID2 a
ID2 c
ID3 a
ID4 b
ID4 a
and I would need it as a list like this to be transformed to "transactions":
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "c"
[[3]]
[1] "a"
[[4]]
[1] "b" "a"
I tried:
lapply(split(item, ID), function(x) as.list(x))
but the items are still on separate "rows" and not one after the other.
Any ideas on how to accomplish the above format?
Use unstack:
df <- data.frame(ID,item)
unstack(df, item~ID)
# $ID1
# [1] "a" "b" "c"
#
# $ID2
# [1] "a" "c"
#
# $ID3
# [1] "a"
#
# $ID4
# [1] "b" "a"
Based on the expected output, you don't need to use as.list
setNames(split(as.character(df1$item),df1$ID) , NULL)
#[[1]]
#[1] "a" "b" "c"
#[[2]]
#[1] "a" "c"
#[[3]]
#[1] "a"
#[[4]]
#[1] "b" "a"
Using your approach and make it working:
> lapply(split(df, df$ID), function(u) u$item)
#$ID1
#[1] "a" "b" "c"
#$ID2
#[1] "a" "c"
#$ID3
#[1] "a"
#$ID4
#[1] "b" "a"
Related
I have a dataset of 6 individuals: A,B,C,D,E,F
I want to group these into two groups of three individuals and have done so with the combn function in R:
m <- combn(n, 3)
This gives me all 20 possible groups where individuals occur in multiple groups. From this set of groups I then went to find all possible combinations of results, where each individual can only be used once.
I would like to do this using combinations without repetition:
C(n,r) = n! / r!(n-r)! and would therefore get 10 results that would look like this:
abc + def
abd + cef
abe + cdf
abf + cde
acd + bef
ace + bdf
acf + bde
ade + bcf
adf + bce
aef + bcd
I am not sure how to code this in R, from the list of groups that I have generated.
Edit: to generate the dataset I am using I have used the following code:
individuals <- c("a","b","c","d","e","f")
n <- length(individuals)
x <- 3
comb = function(n, x) {
factorial(n) / factorial(n-x) / factorial(x)
}
comb(n,x)
(m <- combn(n, 3))
numbers <- m
letters <- individuals
for (i in 1:length(numbers)) {
m[i] <- letters[numbers[i]]
}
In base R:
Create combnations of 3 letters and store it in a list (asplit)
Create new combnations of 2 groups (of 3 letters)
Filter the list to only keep combinations where the both parts have no element in common
individuals <- c("a","b","c","d","e","f")
combn(individuals, 3, simplify = FALSE) |>
combn(m = 2, simplify = FALSE) |>
Filter(f = \(x) !any(x[[1]] %in% x[[2]]))
output
[[1]]
[[1]][[1]]
[1] "a" "b" "c"
[[1]][[2]]
[1] "d" "e" "f"
[[2]]
[[2]][[1]]
[1] "a" "b" "d"
[[2]][[2]]
[1] "c" "e" "f"
[[3]]
[[3]][[1]]
[1] "a" "b" "e"
[[3]][[2]]
[1] "c" "d" "f"
[[4]]
[[4]][[1]]
[1] "a" "b" "f"
[[4]][[2]]
[1] "c" "d" "e"
[[5]]
[[5]][[1]]
[1] "a" "c" "d"
[[5]][[2]]
[1] "b" "e" "f"
[[6]]
[[6]][[1]]
[1] "a" "c" "e"
[[6]][[2]]
[1] "b" "d" "f"
[[7]]
[[7]][[1]]
[1] "a" "c" "f"
[[7]][[2]]
[1] "b" "d" "e"
[[8]]
[[8]][[1]]
[1] "a" "d" "e"
[[8]][[2]]
[1] "b" "c" "f"
[[9]]
[[9]][[1]]
[1] "a" "d" "f"
[[9]][[2]]
[1] "b" "c" "e"
[[10]]
[[10]][[1]]
[1] "a" "e" "f"
[[10]][[2]]
[1] "b" "c" "d"
Let's say I have this vector:
letters[1:7]
[1] "a" "b" "c" "d" "e" "f" "g"
I would like to split it into a non-overlapping list with increasing length of 1, and keep what is left behind (e.g. sub-list 4 should have 4 elements, but there's only one left, and I'd like to keep that one), like the following:
[[1]]
[1] "a"
[[2]]
[1] "b" "c"
[[3]]
[1] "d" "e" "f"
[[4]]
[1] "g"
Please do let me know any direction to solve this, thank you!
Example vector:
x <- letters[1:7]
Solution:
n <- ceiling(0.5 * sqrt(1 + 8 * length(x)) - 0.5)
split(x, rep(1:n, 1:n)[1:length(x)])
#$`1`
#[1] "a"
#
#$`2`
#[1] "b" "c"
#
#$`3`
#[1] "d" "e" "f"
#
#$`4`
#[1] "g"
Something quick'n dirty
splitter = function(x) {
n = length(x)
i = 1
while ( i * (i + 1L) / 2L < (n-i) ) i = i + 1
out = rep(i+1, n)
out[1:(i * (i + 1L) / 2L)] = rep(1:i, 1:i)
unname(split(x, out))
}
splitter(x)
[[1]]
[1] "a"
[[2]]
[1] "b" "c"
[[3]]
[1] "d" "e" "f"
[[4]]
[1] "g"
x <- letters[1:7]
splt <- rep(seq(length(x)), seq(length(x)))[seq(length(x))]
split(x, splt)
#> $`1`
#> [1] "a"
#>
#> $`2`
#> [1] "b" "c"
#>
#> $`3`
#> [1] "d" "e" "f"
#>
#> $`4`
#> [1] "g"
Created on 2022-08-04 by the reprex package (v2.0.1)
Suppose that I have a list similar to this one:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
[[1]]
[1] "b"
[[2]]
[1] "d" "c"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "d" "d" "b" "c"
[[5]]
[1] "d" "d" "c" "c" "b"
[[6]]
[1] "b" "d" "b" "d" "c" "c"
[[7]]
[1] "a" "b" "d" "d" "b" "a" "d"
I would like to have vectors of length one given by the element of higher frequency in the list. Notice that is possible to have vectors of length > 1 if there are no duplicates. The frequency table is like this:
table(unlist(out))[order(table(unlist(out)), decreasing = T)]
b c d a
16 14 13 12
The outcome of the example is something like this:
list("b", "c", "b", "b", "b", "b", "b")
REMARK
It is possible to have vectors of length > 1 if there are no duplicates.
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
length(out)
[1] 10
out[[length(out)+1]] <- c("L", "K")
out
[[1]]
[1] "c"
[[2]]
[1] "d" "a"
[[3]]
[1] "c" "b" "a"
[[4]]
[1] "b" "c" "b" "c"
[[5]]
[1] "a" "a" "d" "c" "d"
[[6]]
[1] "d" "b" "d" "d" "d" "a"
[[7]]
[1] "d" "b" "c" "c" "d" "c" "a"
[[8]]
[1] "d" "a" "d" "b" "d" "a" "b" "d"
[[9]]
[1] "a" "b" "b" "b" "c" "c" "a" "c" "d"
[[10]]
[1] "d" "d" "d" "a" "d" "d" "c" "c" "a" "c"
[[11]]
[1] "L" "K"
Expected outcome:
list("c", "d", "c", "c", "d", "d", "d", "d", "d", "d", c("L", "K"))
I believe that this should work for what you are looking for.
# get counts for entire list and order them
myRanks <- sort(table(unlist(out)), decreasing=TRUE)
This produces
myRanks
b c d a
10 9 5 4
# calculate if most popular, then second most popular, ... item shows up for each list item
sapply(out, function(i) names(myRanks)[min(match(i, names(myRanks)))])
[1] "b" "b" "b" "c" "b" "b" "b"
Here, sapply runs through each list item and returns a vector. It applies a function that selects the name of the first element (via min) of the myRanks table that appears in the list element, using match.
In the case of multiple elements having the same count (duplicates) in the myRanks table, the following code should to return a list of the top observations per list item:
sapply(out,
function(i) {
intersect(names(myRanks)[myRanks == max(unique(myRanks[match(i, names(myRanks))]))],
i)})
Here, the names of myRanks that have the same value as the value in the list item with the highest value in myRanks are intersected with the names present in the list item in order to only return values in both sets.
This should work:
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
sapply(out, function(x) intersect(names(tbl), x)[1])
# [1] "b" "b" "b" "c" "b" "b" "b"
[EDIT]
set.seed(12731)
out <- lapply(1:sample.int(10, 1), function(x){sample(letters[1:4], x, replace = T)})
out[[length(out)+1]] <- c("L", "K")
out
#[[1]]
#[1] "b"
#[[2]]
#[1] "c" "b"
#[[3]]
#[1] "b" "b" "b"
#[[4]]
#[1] "d" "c" "c" "d"
#[[5]]
#[1] "d" "b" "a" "a" "c"
#[[6]]
#[1] "a" "b" "c" "b" "c" "c"
#[[7]]
#[1] "a" "c" "d" "b" "d" "c" "b"
#[[8]]
#[1] "L" "K"
tbl <- table(unlist(out))[order(table(unlist(out)), decreasing = T)]
#tbl
#b c d a K L
#10 9 5 4 1 1
lapply(out, function(x) names(tbl[tbl==max(tbl[names(tbl) %in% intersect(names(tbl), x)])]))
#[[1]]
#[1] "b"
#[[2]]
#[1] "b"
#[[3]]
#[1] "b"
#[[4]]
#[1] "c"
#[[5]]
#[1] "b"
#[[6]]
#[1] "b"
#[[7]]
#[1] "b"
#[[8]]
#[1] "K" "L"
I have dataframe like this (ID, Frequency A B C D E)
ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1
I want to convert this dataframe into test based document like this (ID and their frequency ABCDE as words in a single column). Then I may use LDA algorithm to identify hot topics for each ID.
ID Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"
We can use data.table
library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Or a base R option is split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
If we want to write the 'lst' to csv file, one option is convert the list to data.frame by appending NA at the end to make the length equal while converting to data.frame (as data.frame is a list with equal length (columns))
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
Or use a convenient function from stringi
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)
and then use the write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)
You can use apply and rep like so:
apply(df[-1], 1, function(i) rep(names(df)[-1], i))
For each row, apply feeds the rep function the number of times to repeat each variable name. This returns a list of vectors:
[[1]]
[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
[[2]]
[1] "A" "A" "A" "B" "B" "C" "C" "D"
[[3]]
[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
Where each list element is a row of your data.frame.
data
df <- read.table(header=T, text="ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1")
I have a list of lists (I think), so within each element of the list: res[[i]], I have another list, something like:
[[1]]
[[1]]$a
[[1]]$a$`1`
"aa" "bb" "cc"
[[1]]$a$`2`
"aa" "bb" "cc" "dd"
[[2]]
[[2]]$a
[[2]]$a$`1`
"aa" "bb" "cc"
[[2]]$a$`2`
"aa" "bb" "cc" "dd"
...
I would like to merge all the objects in a new list in which I only have something like:
"aa" "bb" "cc"
"aa" "bb" "cc" "dd"
"aa" "bb" "cc" "cc"
...
any idea???
It looks like you want a "flat" list. For this, you can use unlist with recursive = FALSE, but depending on how deep the list is, that might be tedious. Here's an example:
Your data:
myList <- list(list(a = list("1" = letters[1:3], "2" = letters[1:4])),
list(a = list("1" = letters[1:3], "2" = letters[1:4])))
myList
# [[1]]
# [[1]]$a
# [[1]]$a$`1`
# [1] "a" "b" "c"
#
# [[1]]$a$`2`
# [1] "a" "b" "c" "d"
#
#
#
# [[2]]
# [[2]]$a
# [[2]]$a$`1`
# [1] "a" "b" "c"
#
# [[2]]$a$`2`
# [1] "a" "b" "c" "d"
Using nested unlists:
unlist(unlist(myList, recursive=FALSE), recursive=FALSE)
# $a.1
# [1] "a" "b" "c"
#
# $a.2
# [1] "a" "b" "c" "d"
#
# $a.1
# [1] "a" "b" "c"
#
# $a.2
# [1] "a" "b" "c" "d"
There is also this nifty function called LinearizeNestedList (https://sites.google.com/site/akhilsbehl/geekspace/articles/r/linearize_nested_lists_in_r) that can be downloaded/sourced in R and used as follows (for lists of any depth of nesting):
LinearizeNestedList(myList, NameSep=".")
# $`1.a.1`
# [1] "a" "b" "c"
#
# $`1.a.2`
# [1] "a" "b" "c" "d"
#
# $`2.a.1`
# [1] "a" "b" "c"
#
# $`2.a.2`
# [1] "a" "b" "c" "d"
Edit
It appears this question is a duplicate of How to flatten a list to a list without coercion?
See that question and set of answers for other useful solutions.