Sum specified elements of sub-lists - r

The idea:
Suppose I have a lists with two vectors. Then, I would like to take the first element of the first vector and divide it by the sum of it and the first element of the second vector of the list. Then do that for all elements of the first list. After that, do the same thing but with the second vector of the list.
The code of the lists:
tau1 <- list(c(0.43742669 , 0.64024429, 0.39660069, 0.11849773), c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
My worked code for only two vectors.
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in 1:length(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] / Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]])
}
}
Example:
First element of the list:
TT1 <- tau1[[1]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.4636196
Then for the second element of the list:
TT2 <- tau1[[2]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.5363804
The problem:
I would like to do that for arbitrary number of vectors. For example,
Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]], tau1[[3]][[j]], tau1[[4]][[j]])
How can I do that automatically? any help, please?

If we are using Reduce, then we need to remove the [[i]] to get the sum of corresponding list elements to get a vector. Then subset by the 'j'th index to divide the 'j'th element of 'tau1[[i]]'
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in seq_along(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] /Reduce(`+`, tau1)[j]
}
}
Regarding the error mentioned in the comments, it can happen if there are non-numeric elements. The OP mentioned about NULL elements, but NULL occurs as a single element in a list. So, there is a possibility of character "NULL". For e.g.
tau1 <- list(c(0.43742669 , 0.64024429, "NULL", 0.11849773),
c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
Upon running the code above
Error in f(init, x[[i]]) : non-numeric argument to binary operator

To work with any number of list elements use apply family of functions with [[.
To extract j element in each sub-list use sapply(tau1, "[[", j). To sum those elements use: sum(sapply(tau1, "[[", j))
PS.: Instead of for(j in 1:length(tau1[[1]]){} you should have for(j in 1:length(tau1[[i]]){} - just in case.

Here is a base R one-liner:
lapply(tau1, "/", do.call(mapply, c(FUN = sum, tau1)))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125
Or alternatively (from #lmo's comment):
lapply(tau1, "/", Reduce("+", tau1))
Here is a purrr equivalent:
library(purrr)
tau1 %>% map(`/`, pmap_dbl(., sum))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125

Related

Transform a function into something that lapply works on properly (lists)

I'm trying to make a list with 10 elements, each element consisting of 5 * i items drawn from a uniform distribution, i being the ith entry, and I want to use lapply.
Currently I made this function:
z_list <- list()
z_list_generator <- function(n) {
for(i in 1:n){
a <- runif(5 * i)
tmp <- list(a)
mybiglist[[i]] <- tmp
}
mybiglist
}
This function does give the correct outcome when I just put z_list_generator(2), it prints a list with the first element consisting of 5 elements, the second of 10 elements.
What I want to achieve is that I do lapply(some number, z_list_generator) such that it generates this same list, and such that when I do length(lapply(some number, z_list_generator)), the outcome is 'some number'.
Do you mean something like this?
z_list_generator <- function(k) lapply(1:k, function(i) runif(5 * i))
set.seed(2018) # Fixed random seed for reproducibility
z_list_generator(2)
#[[1]]
#[1] 0.33615347 0.46372327 0.06058539 0.19743361 0.47431419
#
#[[2]]
# [1] 0.3010486 0.6067589 0.1300121 0.9586547 0.5468495 0.3956160 0.6645386
# [8] 0.9821123 0.6782154 0.8060278
length(z_list_generator(2))
#[1] 2
Your z_list_generator is strange.
1) You do not initialise mybiglist in your function code. It probably modifies some global variable.
2) You assign mybiglist elements with another list (of lenght 1), which first element contains a sample from a uniform distrubution. Better assign a, not tmp there.

R - How to create a new list from elements in a list? [duplicate]

This question already has answers here:
How to subset from a list in R
(7 answers)
Closed 6 years ago.
Say I have a list, preds, that contains 3 different sets of predictions based on unique models.
set.seed(206)
preds <- list(rnorm(1:100), rnorm(1:100), rnorm(1:100))
I am having trouble iterating through the list and creating a new list. I want a list of new vectors that are composed of the corresponding index in the other vectors. So for example, I can get the first element in each vector of the list using sapply() like this:
sapply(preds, "[[", 1)
Which works well, but I cannot figure out how to iterate through each element in each vector to construct a new list. I want something like this (doesn't work):
sapply(preds, "[[", 1:100)
Using the above seed and preds list, my desired output would be something like:
first_element <- sapply(preds, "[[", 1)
second_element <- sapply(preds, "[[", 2)
third_element <- sapply(preds, "[[", 3)
desired_result <- rbind(first_element, second_element, third_element)
But for all elements in the list, which in this example is 100.
Please let me know if this is not clear, and I appreciate any help!
We need to use [
res <- sapply(preds, `[`, 1:3 )
all.equal(res, desired_result, check.attributes=FALSE)
#[1] TRUE
Here's an alternative approach:
nth_element <- function(dat, n){
res <- sapply(dat, "[[", n)
res
}
res2 <- do.call(rbind, lapply(c(1:3), function(x) nth_element(dat = preds, n = x)))
Borrowing akrun's check code:
all.equal(res2, desired_result, check.attributes=FALSE)

Is it possible to modify list elements?

I have a list of records:
z <- list(list(a=1),list(a=4),list(a=2))
and I try to add fields to each of them.
Alas, neither
lapply(z,function(l) l$b <- 1+l$a)
nor
for(l in z) l$b <- 1+l$a
modifies z.
In this simple case I can, of course, do
z <- lapply(z,function(l) c(list(b= 1+l$a),l))
but this quickly gets out of hand when the lists have more nesting:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
How do I turn it into
list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
without repeating the definition of the whole structure?
Each element of z has many fields, not just a; and z[[10]]$a has many subfields, not just b.
Your first code example doesn't modify the list because you need to return the list in your call to lapply:
z <- list(list(a=1),list(a=4),list(a=2))
expected <- list(list(a=1, b=2), list(a=4, b=5), list(a=2, b=3))
outcome <- lapply(z,function(l) {l$b <- 1+l$a ; l})
all.equal(expected, outcome)
# [1] TRUE
In the doubly nested example, you could use lapply within lapply, again making sure to return the list in the inner lapply:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
expected <- list(list(a=list(b=1, c=2)), list(a=list(b=4, c=5)), list(a=list(b=2, c=3)))
obtained <- lapply(z, function(l1) { lapply(l1, function(l2) {l2$c = l2$b+1 ; l2 } )})
all.equal(expected, obtained)
# [1] TRUE
Another, somewhat convoluted, option:
z <- list(list(a=1),list(a=4),list(a=2))
res <- list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
res1 <- rapply(z,function(x) list(b = x,c = x+1),how = "replace")
> all.equal(res,res1)
[1] TRUE
I only say convoluted because rapply can be tricky to use at times (for me at least).

compare one list item against the rest in R

If you have a list of files, and you want to compare 1 against a set of the others, how do you do it?
my.test <- list[1]
my.reference.set <- list[-1]
This works of course, but I want to have this in a loop, with my.test varying each time (so that each file in the list is my.test for one iteration i.e. I have a list of 250 files, and I want to do this for every subset of 12 files within it.
> num <- (1:2)
> sdasd<- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
> splitlist<- split(sdasd, num)
> splitlist
$`1`
[1] "asds" "nasd" "nadsd"
$`2`
[1] "ksad" "ksasd" "kasdih"
> for (i in splitlist) {my.test <- splitlist[i] # "asds"
+ my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
+ combined <- data.frame (my.test, my.reference.set)
+ combined}
Error in -i : invalid argument to unary operator
>
then i want next iteration to be,
my.test <- splitlist[i] #my.test to be "nasd"
my.reference.set <- splitlist[-i] # "asds" and "nadsd"
}
and finally for splitlist[1],
my.test <- splitlist[i] # "nadsd"
my.reference.set <- splitlist[-i] # "asds" and "ksad"
}
Then the same for splitlist[2]
Does this do what you want? The key point here is to loop over the indices of the list, rather than the names, because x[-n] indexing only works when n is a natural number (with some obscure exceptions). Also, I wasn't sure if you wanted the results as a data frame or a list -- the latter allows the components to be different lengths.
num <- 1:2
sdasd <- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
splitlist<- split(sdasd, num)
L <- vector("list",length(splitlist))
for (i in seq_along(splitlist)) {
my.test <- splitlist[[i]] # "asds"
my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
L[[i]] <- list(test=my.test, ref.set=my.reference.set)
}
edit: I'm still a little confused by your example above, but I think this is what you want:
refs <- lapply(splitlist,
function(S) {
lapply(seq_along(S),
function(i) {
list(test=S[i], ref.set=S[-i])
})
})
refs is a nested list; the top level has length 2 (the length of splitlist), each of the next levels has length 3 (the lengths of the elements of splitslist), and each of the bottom levels has length 2 (test and reference set).

R - Repetitions of an array in other array

From a dataframe I get a new array, sliced from a dataframe.
I want to get the amount of times a certain repetition appears on it.
For example
main <- c(A,B,C,A,B,V,A,B,C,D,E)
p <- c(A,B,C)
q <- c(A,B)
someFunction(main,p)
2
someFunction(main,q)
3
I've been messing around with rle but it counts every subrepetion also, undersirable.
Is there a quick solution I'm missing?
You can use one of the regular expression tools in R since this is really a pattern matching exercise, specifically gregexpr for this question. The p and q vectors represent the search pattern and main is where we want to search for those patterns. From the help page for gregexpr:
gregexpr returns a list of the same length as text each element of which is of
the same form as the return value for regexpr, except that the starting positions
of every (disjoint) match are given.
So we can take the length of the first list returned by gregexpr which gives the starting positions of the matches. We'll first collapse the vectors and then do the searching:
someFunction <- function(haystack, needle) {
haystack <- paste(haystack, collapse = "")
needle <- paste(needle, collapse = "")
out <- gregexpr(needle, haystack)
out.length <- length(out[[1]])
return(out.length)
}
> someFunction(main, p)
[1] 2
> someFunction(main, q)
[1] 3
Note - you also need to throw "" around your vector main, p, and q vectors unless you have variables A, B, C, et al defined.
main <- c("A","B","C","A","B","V","A","B","C","D","E")
p <- c("A","B","C")
q <- c("A","B")
I'm not sure if this is the best way, but you can simply do that work by:
f <- function(a,b)
if (length(a) > length(b)) 0
else all(head(b, length(a)) == a) + Recall(a, tail(b, -1))
Someone may or may not find a built-in function.
Using sapply:
find_x_in_y <- function(x, y){
sum(sapply(
seq_len(length(y)-length(x)),
function(i)as.numeric(all(y[i:(i+length(x)-1)]==x))
))
}
find_x_in_y(c("A", "B", "C"), main)
[1] 2
find_x_in_y(c("A", "B"), main)
[1] 3
Here's a way to do it using embed(v,n), which returns a matrix of all n-length sub-sequences of vector v:
find_x_in_y <- function(x, y)
sum( apply( embed( y, length(x)), 1,
identical, rev(x)))
> find_x_in_y(p, main)
[1] 2
> find_x_in_y(q, main)
[1] 3

Resources