Subsetting Elements in a "Hypothetical" List - r

I found this R function (Algorithm to calculate power set (all possible subsets) of a set in R) that can return the "power set" for a set of letters. Below I assign it to f() then use it to return a power set in a list.
f <- function(set) {
n <- length(set)
masks <- 2^(1:n-1)
lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
results = f(LETTERS[1:3])
[[1]]
character(0)
[[2]]
[1] "A"
[[3]]
[1] "B"
[[4]]
[1] "A" "B"
[[5]]
[1] "C"
[[6]]
[1] "A" "C"
[[7]]
[1] "B" "C"
[[8]]
[1] "A" "B" "C"
I could then draw some random sequence of letters from the power set object using an index value:
> random = sample.int(length(results), 1)
[1] 6
> results[random]
[[1]]
[1] "A" "C"
Suppose now I want to make a list of the power set for all 26 letters. Since this list would have 2^26 = 67108864 elements, it would be too large to store in memory:
# too big
big_results = f(LETTERS[1:26])
But suppose I were to instead generate some random number I know is an index of the big_results results:
big_random = sample.int(67108864, 1)
13626980
Is there some way of knowing which permutation of letters the "big_random" would correspond to on the "big_results" list - without actually fully running "big_results"?
For example:
# too big to run (hypothetical list)
big_results[big_random]
Could some enumeration or recursion formula be used to figure out some pattern and then somehow determine that "13626980" corresponds to the letters "P H R T L D U Z" for instance?

Related

R for loops: when to use i in seq_along(x) and when to use i in x

I am very new to R and got stuck on writing for loops. Sometime I see people write: for (i in seq_along(x)), while other times they write for (i in x). What is the difference between the two? Does it depend on the properties of x? Help appreciated!
Consider the following vector x:
x <- LETTERS[1:5]
x
[1] "A" "B" "C" "D" "E"
If you perform a for loop on x you are using the values of x:
for(i in x) print(i)
[1] "A"
[1] "B"
[1] "C"
[1] "D"
[1] "E"
If instead you use seq_along, you are creating an integer sequence of the same length as x:
for(i in seq_along(x)) print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Which one is appropriate for which situation is dependent on what you're ultimately trying to do. However, I frequently find myself using seq_along because it is trivial to subset x with i, but finding the index of x of i is more typing.
for(i in seq_along(x)) print(x[i])
[1] "A"
[1] "B"
[1] "C"
[1] "D"
[1] "E"
Another approach you sometimes might see is using 1:length(x). However, as #GregorThomas points out, this can cause unexpected behavior.
Consider the following empty vector y:
y <- vector()
for(i in seq_along(y)) print(1+i)
This results in no output because seq_along(y) evaluates to a zero-length vector.
In contrast, consider 1:length(y):
for(i in 1:length(y)) print(1+i)
[1] 2
[1] 1
This is because 1:length(y) evaluates to c(1,0).

How to use a list as an argument in a function in R

myfunction<-function(x){if (x=="g"){g_var<-x g_nvar<-length(g_var)} return(g_nvar)}
I have written the above script to obtain specific elements out of a list. The argument x will be a list when I will call upon this function but R does not consider x as a list. How can I write a function such that when I provide a list, my output are the elements that I have specified in the function?
m
[[1]]
[[1]] [[1]]
[1] "g" "g" "h" "g" "g" "g" "k" "l"
[[2]]
[[2]] [[1]]
[1] "g" "h" "k" "k" "l" "g"
Expected result
[[1]] 5 # No. of g
[[2]] 2 # No. of g
Similarly I would like to obtain numbers for h,k and l also. I am putting m as x while calling the function.
For eg:- myfunction (m)
Your case is somewhat complicated by the fact that your m is not simply a list of character vectors, but, in your example, a list of 2 lists of 1 vector of characters, as would be generated by
m = list(strsplit("gghgggkl", ""), strsplit("ghkklg", ""))
If we want myfunction to operate on this data structure, we have to refer to the component of the length-1-lists with the operation [[1]] (see x[[1]] below), and, as loki suggested, we can use lapply to work on all components of the outer list, and sum with a logical expression to obtain the desired count:
myfunction = function(m) lapply(m, function(x) sum(x[[1]]=='g'))
myfunction(m)
result:
[[1]]
[1] 5
[[2]]
[1] 2

What's the name of the first argument of `[`?

letter[2] is equivalent to '['(letters,i=2) , second argument is i.
What is the name of the first argument so the 2 following expressions would be equivalent ?
lapply(1:3,function(x){letters[x]})
lapply(1:3,`[`,param1 = letters) # param1 to be replaced with solution
For you to be able to define a function similar to the one above, you will have to pass two arguments to your function. The function [ does take various inputs. We can use Map instead of lapply to give it both the data where to extract from and the Indices to indicate the part of the data to be extracted:
Map("[",list(letters),1:3)
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
This is similar to what you have above. Hope this helps
You have to be could be more specific than "[", for instance:
lapply(1:3, `[.numeric_version`, x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"
(Not sure [.numeric_version is the most appropriate, though... I'm digging a bit more)
rlang::as_closure and purrr::as_mapper ,both based on rlang::as_function (see doc)
will both convert [ to a function with named parameters:
lapply(1:3, purrr::as_mapper(`[`), .x = letters)
lapply(1:3, rlang::as_closure(`[`), .x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"

extracting element of a vector based on index given in a list in R

I am looking to extract elements of a character vector based of the index in a list. For eg: I have a vector CV and index that I will like to extract are in a list AL. I can extract them individually (through a loop) but I was wondering if there's a way that I can do it without having to use the loop (perhaps using apply function). I tried using sapply unsuccessfully.
CV = c("a","b","c","d")
AL = list(c(1,2),c(2,3,4),c(2))
CV[AL[[1]]]
[1] "a" "b"
sapply(CC,'[',AL)
Your problem is that
sapply(CV,'[',AL)
will (attempt to) iterate over each element of CV, but you want to iterate over each element of AL:
sapply(AL, function(z) CV[z])
# [[1]]
# [1] "a" "b"
#
# [[2]]
# [1] "b" "c" "d"
#
# [[3]]
# [1] "b"

Searching for specified pattern in R list

I have a dataset stored as a list DataList
[[1]]
[1] a
[2] f
[3] e
[4] a
[[2]]
[1] f
[2] f
[3] e
I am trying to create a function Getfrequence which return the frequence of a given pattern in the list DataList
GetFrequence<- function(pattern, DataList)
{
freq= 0
i = 1
while (i<= List.length())
{
if (.....)
freq= freq + 1
}
return freq
}
My question is how can I search if the given pattern exists in the list?
I assume that with pattern, you mean the different elements in your list. Then something like this might be helpful?
First, let us create a list roughly similar to the one you have provided above:
a <- list(letters[1:3], letters[1:2], letters[1:5])
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "b"
[[3]]
[1] "a" "b" "c" "d" "e"
Now, to get the frequency of each items across the whole list, we can unlist the list and stack everything into one vector. Once we have a simple vector left, we can use table.
table(unlist(a))
a b c d e
3 3 2 1 1
Note that you may have to use unlist several times, depending on your actual list-structure. That is, if you have a list of lists, it might be necessary to adjust the code somewhat. In that case, please post str(your_list).

Resources