Save the results of combn() into a dataframe - r

Using the combn() in console we can see a list of results.
> combn(3, 2, simplify = FALSE)
[[1]]
[1] 1 2
[[2]]
[1] 1 3
[[3]]
[1] 2 3
How is it possible to save the result to a dataframe with one column which will have the results?
Example of the new dataframe:
1 2
1 3
2 3

Probably not the most elegant, but this should work :
data.frame(t(data.frame(combn(3, 2, simplify = FALSE))),row.names = NULL)

Related

Unlisting a list of list while keeping second list names

I would like to unlist a list of list while keeping the names of the second list.
For example if have a list like this:
$`listA`
$`listA_a`
[1] 1 2
$`listA_g`
[1] 1 2
$`listB`
$`listB_b`
[1] 1 2
I would like to obtain this list:
$`listA_a`
[1] 1 2
$`listA_g`
[1] 1 2
$`listB_b`
[1] 1 2
I know there is an argument in unlist to keep names (use.names = T, which is true by default)
however it keeps the names of the first list and add a number if there is several elements ("listA1", "listA2", "listB").
(This is an example but in my code the elements of the list are plots so I cannot use a data.frame or anything... I cannot easily reconstruct the names as they contain informations about the data used for the plots).
Thank you very much for your help!
Pernille
Try this approach. You can use unlist() with recursive=F to keep the desired structure and then format the names. Here the code:
#Data
List <- list(listA = list(listA_a = c(1, 2), listA_g = c(1, 2)), listB = list(
listB_b = c(1, 2)))
#Code
L <- unlist(List,recursive = F)
names(L) <- gsub(".*\\.","", names(L) )
L
Output:
L
$listA_a
[1] 1 2
$listA_g
[1] 1 2
$listB_b
[1] 1 2
Or the more simplified version without regex (Many thanks and credits to #markus):
#Code 2
L <- unlist(unname(List),recursive = F)
Output:
L
$listA_a
[1] 1 2
$listA_g
[1] 1 2
$listB_b
[1] 1 2
We could use rrapply from rrapply
library(rrapply)
rrapply(List, how = 'flatten')
#$listA_a
#[1] 1 2
#$listA_g
#[1] 1 2
#$listB_b
#[1] 1 2
data
List <- list(listA = list(listA_a = c(1, 2), listA_g = c(1, 2)), listB = list(
listB_b = c(1, 2)))
Another option is using flatten from package purrr
> purrr::flatten(lst)
$listA_a
[1] 1 2
$listA_g
[1] 1 2
$listB_b
[1] 1 2
Another option would be to make use of e.g. Reduce to concatenate the sublists:
list_of_lists <- list(
listA = list(listA_a = c(1, 2), listA_g = c(1, 2)),
listB = list(listB_b = c(1, 2)
))
Reduce(c, list_of_lists)
#> $listA_a
#> [1] 1 2
#>
#> $listA_g
#> [1] 1 2
#>
#> $listB_b
#> [1] 1 2

remove certain vectors from a list

I want to remove certain vectors from a list. I have for example this:
a<-c(1,2,5)
b<-c(1,1,1)
c<-c(1,2,3,4)
d<-c(1,2,3,4,5)
exampleList<-list(a,b,c,d)
exampleList returns of course:
[[1]]
[1] 1 2 5
[[2]]
[1] 1 1 1
[[3]]
[1] 1 2 3 4
[[4]]
[1] 1 2 3 4 5
Is there a way to remove certain vectors from a list in R. I want to remove all vectors in the list exampleList which contain both 1 and 5(so not only vectors which contain 1 or 5, but both). Thanks in advance!
Use Filter:
filteredList <- Filter(function(v) !(1 %in% v & 5 %in% v), exampleList)
print(filteredList)
#> [[1]]
#> [1] 1 1 1
#>
#> [[2]]
#> [1] 1 2 3 4
Filter uses a functional style. The first argument you pass is a function that returns TRUE for an element you want to keep in the list, and FALSE for an element you want to remove from the list. The second argument is just the list itself.
We can use sapply on every list element and remove those elements where both the values 1 and 5 are present.
exampleList[!sapply(exampleList, function(x) any(x == 1) & any(x == 5))]
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 1 2 3 4
Here a solution with two steps:
exampleList<-list(a=c(1,2,5), b=c(1,1,1), c=c(1,2,3,4), d=c(1,2,3,4,5))
L <- lapply(exampleList, function(x) if (!all(c(1,5) %in% x)) x)
L[!sapply(L, is.null)]
# $b
# [1] 1 1 1
#
# $c
# [1] 1 2 3 4
Here is a one-step variant without any definition of a new function
exampleList[!apply(sapply(exampleList, '%in%', x=c(1,5)), 2, all)]
(... but it has two calls to apply-functions)

Call a specific column from every dataframe from list of dataframes

I like to report a specific column from every dataframe from a list of dataframes. Any ideas? This is my code:
# Create dissimilarity matrix
df.diss<-dist(t(df[,6:11]))
mds.table<-list() # empty list of values
for(i in 1:6){ # For Loop to iterate over a command in a function
a<-mds(pk.diss,ndim=i, type="ratio", verbose=TRUE,itmax=1000)
mds.table[[i]]<-a # Store output in empty list
}
Now here is where I'm having trouble. After storing the values, I'm unable to call a specific column from every dataframe from the list.
# This function should call every $stress column from each data frame.
lapply(mds.table, function(x){
mds.table[[x]]$stress
})
Thanks again!
you are very close:
set.seed(1)
l_df <- lapply(1:5, function(x){
data.frame(a = sample(1:5,5), b = sample(1:5,5))
})
lapply(l_df, function(x){
x[['a']]
})
[[1]]
[1] 2 5 4 3 1
[[2]]
[1] 2 1 3 4 5
[[3]]
[1] 5 1 2 4 3
[[4]]
[1] 3 5 2 1 4
[[5]]
[1] 5 3 4 2 1

Subset a list (choose matching values for all components)

I try to read out certain elements from a list in a way, thats equivalent to df[, c(1,4,5)] in a data.frame.
> obj <- list(c(1:5), c(1:5))
> obj
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 1 2 3 4 5
Im looking for a result like this
[[1]]
[1] 1 4 5
[[2]]
[1] 1 4 5
I have been experimenting with [], [[]] and [[]][] but thats assessing the list componentwise.
I've also been trying lapply(obj, c(1,4,5)) and looking at match().
I'm not quite there yet, help would be appreciated.
Thx!
You should call lapply with a function which is run on every list entry:
obj <- list(c(1:5), c(1:5))
lapply(obj, function(x) x[c(1, 4, 5)])
#[[1]]
[1] 1 4 5
[[2]]
[1] 1 4 5
EDi has a great answer, but you can do it by passing the [ function to lapply plus additional arguments:
lapply(obj, '[', c(1, 4, 5))
You can access this and the other "weird" functions in R by quoting them:
?"["

Best way to store variable-length data in an R data.frame?

I have some mixed-type data that I would like to store in an R data structure of some sort. Each data point has a set of fixed attributes which may be 1-d numeric, factors, or characters, and also a set of variable length data. For example:
id phrase num_tokens token_lengths
1 "hello world" 2 5 5
2 "greetings" 1 9
3 "take me to your leader" 4 4 2 2 4 6
The actual values are not all computable from one another, but that's the flavor of the data. The operations I'm going to want to do include subsetting the data based on boolean functions (e.g. something like nchar(data$phrase) > 10 or lapply(data$token_lengths, length) > 2). I'd also like to index and average values in the variable length portion by index. This doesn't work, but something like: mean(data$token_lengths[1], na.rm=TRUE))
I've found I can shoehorn "token_lengths" into a data.frame by making it an array:
d <- data.frame(id=c(1,2,3), ..., token_lengths=as.array(list(c(5,5), 9, c(4,2,2,4,6)))
But is this the best way?
Trying to shoehorn the data into a data frame seems hackish to me. Far better to consider each row as an individual object, then think of the dataset as an array of these objects.
This function converts your data strings to an appropriate format. (This is S3 style code; you may prefer to use one of the 'proper' object oriented systems.)
as.mydata <- function(x)
{
UseMethod("as.mydata")
}
as.mydata.character <- function(x)
{
convert <- function(x)
{
md <- list()
md$phrase = x
spl <- strsplit(x, " ")[[1]]
md$num_words <- length(spl)
md$token_lengths <- nchar(spl)
class(md) <- "mydata"
md
}
lapply(x, convert)
}
Now your whole dataset looks like
mydataset <- as.mydata(c("hello world", "greetings", "take me to your leader"))
mydataset
[[1]]
$phrase
[1] "hello world"
$num_words
[1] 2
$token_lengths
[1] 5 5
attr(,"class")
[1] "mydata"
[[2]]
$phrase
[1] "greetings"
$num_words
[1] 1
$token_lengths
[1] 9
attr(,"class")
[1] "mydata"
[[3]]
$phrase
[1] "take me to your leader"
$num_words
[1] 5
$token_lengths
[1] 4 2 2 4 6
attr(,"class")
[1] "mydata"
You can define a print method to make this look prettier.
print.mydata <- function(x)
{
cat(x$phrase, "consists of", x$num_words, "words, with", paste(x$token_lengths, collapse=", "), "letters.")
}
mydataset
[[1]]
hello world consists of 2 words, with 5, 5 letters.
[[2]]
greetings consists of 1 words, with 9 letters.
[[3]]
take me to your leader consists of 5 words, with 4, 2, 2, 4, 6 letters.
The sample operations you wanted to do are fairly straightforward with data in this format.
sapply(mydataset, function(x) nchar(x$phrase) > 10)
[1] TRUE FALSE TRUE
I would just use the data in the "long" format.
E.g.
> d1 <- data.frame(id=1:3, num_words=c(2,1,4), phrase=c("hello world", "greetings", "take me to your leader"))
> d2 <- data.frame(id=c(rep(1,2), rep(2,1), rep(3,5)), token_length=c(5,5,9,4,2,2,4,6))
> d2$tokenid <- with(d2, ave(token_length, id, FUN=seq_along))
> d <- merge(d1,d2)
> subset(d, nchar(phrase) > 10)
id num_words phrase token_length tokenid
1 1 2 hello world 5 1
2 1 2 hello world 5 2
4 3 4 take me to your leader 4 1
5 3 4 take me to your leader 2 2
6 3 4 take me to your leader 2 3
7 3 4 take me to your leader 4 4
8 3 4 take me to your leader 6 5
> with(d, tapply(token_length, id, mean))
1 2 3
5.0 9.0 3.6
Once the data is in the long format, you can use sqldf or plyr to extract what you want from it.
Another option would be to convert your data frame into a matrix of mode list - each element of the matrix would be a list. standard array operations (slicing with [, apply(), etc. would be applicable).
> d <- data.frame(id=c(1,2,3), num_tokens=c(2,1,4), token_lengths=as.array(list(c(5,5), 9, c(4,2,2,4,6))))
> m <- as.matrix(d)
> mode(m)
[1] "list"
> m[,"token_lengths"]
[[1]]
[1] 5 5
[[2]]
[1] 9
[[3]]
[1] 4 2 2 4 6
> m[3,]
$id
[1] 3
$num_tokens
[1] 4
$token_lengths
[1] 4 2 2 4 6
Since the R data frame structure is based loosely on the SQL table, having each element of the data frame be anything other than an atomic data type is uncommon. However, it can be done, as you've shown, and this linked post describes such an application implemented on a larger scale.
An alternative is to store your data as a string and have a function to retrieve it, or create a separate function to which the data is attached and extract it using indices stored in your data frame.
> ## alternative 1
> tokens <- function(x,i=TRUE) Map(as.numeric,strsplit(x[i],","))
> d <- data.frame(id=c(1,2,3), token_lengths=c("5,5", "9", "4,2,2,4,6"))
>
> tokens(d$token_lengths)
[[1]]
[1] 5 5
[[2]]
[1] 9
[[3]]
[1] 4 2 2 4 6
> tokens(d$token_lengths,2:3)
[[1]]
[1] 9
[[2]]
[1] 4 2 2 4 6
>
> ## alternative 2
> retrieve <- local({
+ token_lengths <- list(c(5,5), 9, c(4,2,2,4,6))
+ function(i) token_lengths[i]
+ })
>
> d <- data.frame(id=c(1,2,3), token_lengths=1:3)
> retrieve(d$token_lengths[2:3])
[[1]]
[1] 9
[[2]]
[1] 4 2 2 4 6
I would also use strings for the variable length data, but as in the following example: "c(5,5)" for the first phrase. One needs to use eval(parse(text=...)) to carry out computations.
For example, the mean can be computed as follows:
sapply(data$token_lengths,function(str) mean(eval(parse(text=str))))

Resources