Storing the values from IF loop in a vector - r

I am fetching bins.txt and saving its data in "data". I tried printing it and it is printing properly.
data <- read.csv("bins.txt", header = FALSE)
for (n in 1:24060)
{
j=(data[n,])
for (i in 1:20)
{
m=(i-1)*80
n=(i*80)-1
if(m<j && j<n)
{
print (i)
}
}
}
I wish to not print(i) but store the values of i in some vector and print it outside the loop and pass it in
obs="vector"
Somewhat like this

No idea what your bins.txt is. Since I really dislike nested loops, here's a suggestion:
(i) define the twenty pairs of min (or m) and max (or j) values in condition check:
m <- lapply(1:20, function(x) (x-1)*80)
n <- lapply(1:20, function(x) (x*80)-1)
(ii) return a list of twenty vectors based against data based on the twenty combinations of m and n:
lapply(1:20, function(x) dat[m[[x]] < dat & dat < n[[x]]])
Assuming that your data is
dat <- seq(0, 1000, length.out=50)
The first six vectors returned are:
[[1]]
[1] 20.40816 40.81633 61.22449
[[2]]
[1] 81.63265 102.04082 122.44898 142.85714
[[3]]
[1] 163.2653 183.6735 204.0816 224.4898
[[4]]
[1] 244.8980 265.3061 285.7143 306.1224
[[5]]
[1] 326.5306 346.9388 367.3469 387.7551
[[6]]
[1] 408.1633 428.5714 448.9796 469.3878

Related

Filter list in R base on criteria within list objects

This is a trivial question, but I'm stumped. How can I filter a list of dataframes based on their length? The list is nested -- meaning there are lists of lists of dataframes of different lengths. Here is an example. I'd like to filter or subset the list to include only those objects that are length n, say 3.
Here is an example and my current approach.
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
# Here is my attempt, I'm stuck at how to obtain
# the number of varibles in each dataframe and then filter by it.
map(star.wars_ls, function(x){
map(x, function(x){ ## Incorrectly returns 20 for all
length(y)
})
})
We can do
map(star.wars_ls, ~ if(is.data.frame(.x)) .x[length(.x) == 3] else map(.x, ~ .x[length(.x) == 3]))
You should be able to check whether the item in the star.wars_ls is a list or a data frame. Then, check the number of columns within each item. Try using:
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
datacols <- map(star.wars_ls, function(X) {
if (is.data.frame(X) == T) {
ncol(X) }
else {
map(X, function(Y) {
ncol(Y)
})
}
}
)
# > datacols
# [[1]]
# [1] 5
#
# [[2]]
# [[2]][[1]]
# [1] 8
#
# [[2]][[2]]
# [1] 3
#
#
# [[3]]
# [1] 2
#
# [[4]]
# [[4]][[1]]
# [1] 7
#
# [[4]][[2]]
# [1] 5
#
#
# [[5]]
# [1] 3
This will only give you the length (number of columns) of each data frame within the list. To get the indices (I'm sure there's a more efficient way to do this -- maybe someone else can help with that):
indexlist <- c()
for (i in 1:length(datacols)) {
if (length(datacols[[i]]) == 1) {
if (datacols[[i]][1] == 3) {
index <- i
indexlist <- c(indexlist, as.character(index))
}
} else {
for (j in 1:length(datacols[[i]])) {
if (datacols[[i]][[j]][1] == 3) {
index <- str_c(i, ",", j)
indexlist <- c(indexlist, index)
}
}
}
}
# > indexlist
# [1] "2,2" "5"
you could use recursion. It doesnt matter how deeply nested the list is:
ff = function(x)map(x,~if(is.data.frame(.x)){if(length(.x)==3) .x} else ff(.x))
ff(star.wars_ls)

Avoid storing null values when skipping an iteration in a for loop

Exist a way to avoiding to store null values in an iterative process when some condition is activated to skip to the next iteration? The intention of "how to solve" this problem is with the structure itself of the loop
[CONTEXT]:
I refer to the case when you need to use a storing mechanism inside a loop in conjunction with a conditional statement, and it is given the scenario where basically one of the possibles path is not of your interest. In the honor to give the treatment in the moment, and not posterior of the computation, you skip to the next iteration.
[EXAMPLE]
Suppose given a certain sequence of numbers, I interested only in stored the numbers of the sequence that are greater than 2 in a list.
storeGreaterThan2 <- function(x){
y <- list()
for (i in seq_along(x)) {
if (x[i] > 2) {
y[[i]] <- x[i]
} else {
next
}
}
y
}
The previous function deal with the final purpose, but when the condition to skip the iteration is activated the missing operation in the index is filled with a null value in the final list.
> storeGeaterThan2(1:5)
[[1]]
NULL
[[2]]
NULL
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
In the spirit of dealing with the problem inside the structure of the loop, how it could deal with that?
This is a rather strange example, and I wonder if it's an x-y problem. It may be better to say more about your situation and what you ultimately want to do. For example, there are different ways of trying to do this depending on if the function's input will always be an ascending sequence. #Dave2e's comment that there will be better ways depending of what you are really after is right on the mark, in my opinion. At any rate, you can simply removed the NULL elements before you return the list. Consider:
storeGreaterThan2 <- function(x){
y <- list()
for(i in seq_along(x)) {
if(x[i] > 2) {
y[[i]] <- x[i]
} else {
next
}
}
y <- y[-which(sapply(y, is.null))]
return(y)
}
storeGreaterThan2(1:5)
# [[1]]
# [1] 3
#
# [[2]]
# [1] 4
#
# [[3]]
# [1] 5
Here is a possible way to do this without ever having stored the NULL element, rather than cleaning it up at the end:
storeGreaterThan2 <- function(x){
y <- list()
l <- 1 # l is an index for the list
for(i in seq_along(x)){ # i is an index for the x vector
if(x[i] > 2) {
y[[l]] <- x[i]
l <- l+1
}
}
return(y)
}

print list names when iterating lapply [duplicate]

This question already has answers here:
Access lapply index names inside FUN
(12 answers)
Closed 8 years ago.
I have a time series (x,y,z and a) in a list name called dat.list. I would like to apply a function to this list using lapply. Is there a way that I can print the element names i.e., x,y,z and a after each iteration is completed in lapply. Below is the reproducible example.
## Create Dummy Data
x <- ts(rnorm(40,5), start = c(1961, 1), frequency = 12)
y <- ts(rnorm(50,20), start = c(1971, 1), frequency = 12)
z <- ts(rnorm(50,39), start = c(1981, 1), frequency = 12)
a <- ts(rnorm(50,59), start = c(1991, 1), frequency = 12)
dat.list <- list(x=x,y=y,z=z,a=a)
## forecast using lapply
abc <- function(x) {
r <- mean(x)
print(names(x))
return(r)
}
forl <- lapply(dat.list,abc)
Basically, I would like to print the element names x,y,z and a every time the function is executed on these elements. when I run the above code, I get null values printed.
The item names do not get passed to the second argument from lapply, only the values do. So if you wanted to see the names then the calling strategy would need to be different:
> abc <- function(nm, x) {
+ r <- mean(x)
+ print(nm)
+ return(r)
+ }
>
> forl <- mapply(abc, names(dat.list), dat.list)
[1] "x"
[1] "y"
[1] "z"
[1] "a"
You can use some deep digging (which I got from another answer on SO--I'll try to find the link) and do something like this:
abc <- function(x) {
r <- mean(x)
print(eval.parent(quote(names(X)))[substitute(x)[[3]]])
return(r)
}
forl <- lapply(dat.list, abc)
# [1] "x"
# [1] "y"
# [1] "z"
# [1] "a"
forl
# $x
# [1] 5.035647
#
# $y
# [1] 19.78315
#
# $z
# [1] 39.18325
#
# $a
# [1] 58.83891
Our you can just lapply across the names of the list (similar to what #BondedDust did), like this (but you lose the list names in the output):
abc <- function(x, y) {
r <- mean(y[[x]])
print(x)
return(r)
}
lapply(names(dat.list), abc, y = dat.list)

find all disjoint (non-overlapping) sets from a set of sets

My problem: need to find all disjoint (non-overlapping) sets from a set of sets.
Background: I am using comparative phylogenetic methods to study trait evolution in birds. I have a tree with ~300 species. This tree can be divided into subclades (i.e. subtrees). If two subclades do not share species, they are independent. I'm looking for an algorithm (and an R implementation if possible) that will find all possible subclade partitions where each subclade has greater than 10 taxa and all are independent. Each subclade can be considered a set and when two subclades are independent (do not share species) these subclades are then disjoint sets.
Hope this is clear and someone can help.
Cheers,
Glenn
The following code produces an example dataset. Where subclades is a list of all possible subclades (sets) from which I'd like to sample X disjoint sets, where the length of the set is Y.
###################################
# Example Dataset
###################################
library(ape)
library(phangorn)
library(TreeSim)
library(phytools)
##simulate a tree
n.taxa <- 300
tree <- sim.bd.taxa(n.taxa,1,lambda=.5,mu=0)[[1]][[1]]
tree$tip.label <- seq(n.taxa)
##extract all monophyletic subclades
get.all.subclades <- function(tree){
tmp <- vector("list")
nodes <- sort(unique(tree$edge[,1]))
i <- 282
for(i in 1:length(nodes)){
x <- Descendants(tree,nodes[i],type="tips")[[1]]
tmp[[i]] <- tree$tip.label[x]
}
tmp
}
tmp <- get.all.subclades(tree)
##set bounds on the maximum and mininum number of tips of the subclades to include
min.subclade.n.tip <- 10
max.subclade.n.tip <- 40
##function to replace trees of tip length exceeding max and min with NA
replace.trees <- function(x, min, max){
if(length(x) >= min & length(x)<= max) x else NA
}
#apply testNtip across all the subclades
tmp2 <- lapply(tmp, replace.trees, min = min.subclade.n.tip, max = max.subclade.n.tip)
##remove elements from list with NA, 
##all remaining elements are subclades with number of tips between
##min.subclade.n.tip and max.subclade.n.tip
subclades <- tmp2[!is.na(tmp2)]
names(subclades) <- seq(length(subclades))
Here's an example of how you might test each pair of list elements for zero overlap, extracting the indices of all non-overlapping pairs.
findDisjointPairs <- function(X) {
## Form a 2-column matrix enumerating all pairwise combos of X's elements
ij <- t(combn(length(X),2))
## A function that tests for zero overlap between a pair of vectors
areDisjoint <- function(i, j) length(intersect(X[[i]], X[[j]])) == 0
## Use mapply to test for overlap between each pair and extract indices
## of pairs with no matches
ij[mapply(areDisjoint, ij[,1], ij[,2]),]
}
## Make some reproducible data and test the function on it
set.seed(1)
A <- replicate(sample(letters, 5), n=5, simplify=FALSE)
findDisjointPairs(A)
# [,1] [,2]
# [1,] 1 2
# [2,] 1 4
# [3,] 1 5
Here are some functions that might be useful.
The first computes all possible disjoint collections of a list of sets.
I'm using "collection" instead of "partition" beacause a collection does not necessarily covers the universe (i. e., the union of all sets).
The algorithm is recursive, and only works for a small number of possible collections. This does not necessarily means that it won't work with a large list of sets, since the function removes the intersecting sets at every iteration.
If the code is not clear, please ask and I'll add comments.
The input must be a named list, and the result will be a list of collection, which is a character vector indicating the names of the sets.
DisjointCollectionsNotContainingX <- function(L, branch=character(0), x=numeric(0))
{
filter <- vapply(L, function(y) length(intersect(x, y))==0, logical(1))
L <- L[filter]
result <- list(branch)
for( i in seq_along(L) )
{
result <- c(result, Recall(L=L[-(1:i)], branch=c(branch, names(L)[i]), x=union(x, L[[i]])))
}
result
}
This is just a wrapper to hide auxiliary arguments:
DisjointCollections <- function(L) DisjointCollectionsNotContainingX(L=L)
The next function can be used to validade a given list of collections supposedly non-overlapping and "maximal".
For every collection, it will test if
1. all sets are effectively disjoint and
2. adding another set either results in a non-disjoint collection or an existing collection:
ValidateDC <- function(L, DC)
{
for( collection in DC )
{
for( i in seq_along(collection) )
{
others <- Reduce(f=union, x=L[collection[-i]])
if( length(intersect(L[collection[i]], others)) > 0 ) return(FALSE)
}
elements <- Reduce(f=union, x=L[collection])
for( k in seq_along(L) ) if( ! (names(L)[k] %in% collection) )
{
if( length(intersect(elements, L[[k]])) == 0 )
{
check <- vapply(DC, function(z) setequal(c(collection, names(L)[k]), z), logical(1))
if( ! any(check) ) return(FALSE)
}
}
}
TRUE
}
Example:
L <- list(A=c(1,2,3), B=c(3,4), C=c(5,6), D=c(6,7,8))
> ValidateDC(L,DisjointCollections(L))
[1] TRUE
> DisjointCollections(L)
[[1]]
character(0)
[[2]]
[1] "A"
[[3]]
[1] "A" "C"
[[4]]
[1] "A" "D"
[[5]]
[1] "B"
[[6]]
[1] "B" "C"
[[7]]
[1] "B" "D"
[[8]]
[1] "C"
[[9]]
[1] "D"
Note that the collections containing A and B simultaneously do not show up, due to their non-null intersection. Also, collections with C and D simultaneously don't appear. Others are OK.
Note: the empty collection character(0) is always a valid combination.
After creating all possible disjoint collections, you can apply any filters you want to proceed.
EDIT:
I've removed the line if( length(L)==0 ) return(list(branch)) from the first function; it's not needed.
Performance: If there is considerable overlapping among sets, the function runs fast. Example:
set.seed(1)
L <- lapply(1:50, function(.)sample(x=1200, size=20))
names(L) <- c(LETTERS, letters)[1:50]
system.time(DC <- DisjointCollections(L))
Result:
# user system elapsed
# 9.91 0.00 9.92
Total number of collections found:
> length(DC)
[1] 121791

R populate list by its values

Say I have a list:
> fs
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110
I want to "reverse fill" the rest of the list by using it's values. Example:
The [[3]] should have the function value of [[4]] pairs:
c( myFunction(fs[[4]][1], fs[[4]][2]), myFunction(fs[[4]][2], fs[[4]][3]), .... )
The [[2]] should have myFunction values of [[3]] etc...
I hope that's clear. What's the right way to do it? For loops? *applys? My last attempt, which leaves 1-3 empty:
n = length(fs)
for (i in rev(1:(n-1)))
child_fs = fs[[i+1]]
res = c()
for (j in 1:(i+1))
up = v(child_fs[j])
do = v(child_fs[j+1])
this_f = myFunction(up, do)
res[j] = this_f
fs[[i]] = res
Make fs easily reproducible
fs <- list(NULL, NULL, NULL, c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110))
To be able to show an example, make a trivial myFunction
myFunction <- function(a, b) {a + b}
You can loop over all but the last positions in fs (in reverse order), and compute each. Just call myFunciton with the vectors which are the next higher position's vectors without the last and without the first element.
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- myFunction(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
That assumes myFunction is vectorized (given vectors for inputs, will give a vector for output). If it isn't, you can easily make a version which is.
myFunction <- function(a, b) {a[[1]] + b[[1]]}
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- Vectorize(myFunction)(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
In either case, you get
> fs
[[1]]
[1] 453.2 426.8
[[2]]
[1] 233.398 219.802 206.998
[[3]]
[1] 120.200 113.198 106.604 100.394
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110
Really, what you have is a starting point
start <- c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110)
a function you want to apply (I made this one up which adds 1 everywhere and deletes the last element)
myFunction <- function(x) head(x + 1, -1L)
and the number of times you want to apply the function (recursively):
n <- 3L
So I would write a function to apply the function n times recursively, then reverse the output list:
apply.n.times <- function(fun, n, x)
if (n == 0L) list(x) else c(list(x), Recall(fun, n - 1L, fun(x)))
rev(apply.n.times(myFunction, n, start))
# [[1]]
# [1] 64.90298 61.29699
#
# [[2]]
# [1] 63.90298 60.29699 56.90104
#
# [[3]]
# [1] 62.90298 59.29699 55.90104 52.70293
#
# [[4]]
# [1] 61.90298 58.29699 54.90104 51.70293 48.69110
Here is a one-line solution (if myFunction can be replaced with something like sum, or in this case rowSums):
Reduce( function(x,y) rowSums( embed(y,2) ), fs, right=TRUE, accumulate=TRUE )
If myFunction needs to accept 2 values and do something with them then this can be expanded a bit to:
Reduce( function(x,y) apply( embed(y,2), 1, function(z) myFunction(z[1],z[2]) ),
fs, right=TRUE, accumulate=TRUE )

Resources