If you have a list of files, and you want to compare 1 against a set of the others, how do you do it?
my.test <- list[1]
my.reference.set <- list[-1]
This works of course, but I want to have this in a loop, with my.test varying each time (so that each file in the list is my.test for one iteration i.e. I have a list of 250 files, and I want to do this for every subset of 12 files within it.
> num <- (1:2)
> sdasd<- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
> splitlist<- split(sdasd, num)
> splitlist
$`1`
[1] "asds" "nasd" "nadsd"
$`2`
[1] "ksad" "ksasd" "kasdih"
> for (i in splitlist) {my.test <- splitlist[i] # "asds"
+ my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
+ combined <- data.frame (my.test, my.reference.set)
+ combined}
Error in -i : invalid argument to unary operator
>
then i want next iteration to be,
my.test <- splitlist[i] #my.test to be "nasd"
my.reference.set <- splitlist[-i] # "asds" and "nadsd"
}
and finally for splitlist[1],
my.test <- splitlist[i] # "nadsd"
my.reference.set <- splitlist[-i] # "asds" and "ksad"
}
Then the same for splitlist[2]
Does this do what you want? The key point here is to loop over the indices of the list, rather than the names, because x[-n] indexing only works when n is a natural number (with some obscure exceptions). Also, I wasn't sure if you wanted the results as a data frame or a list -- the latter allows the components to be different lengths.
num <- 1:2
sdasd <- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
splitlist<- split(sdasd, num)
L <- vector("list",length(splitlist))
for (i in seq_along(splitlist)) {
my.test <- splitlist[[i]] # "asds"
my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
L[[i]] <- list(test=my.test, ref.set=my.reference.set)
}
edit: I'm still a little confused by your example above, but I think this is what you want:
refs <- lapply(splitlist,
function(S) {
lapply(seq_along(S),
function(i) {
list(test=S[i], ref.set=S[-i])
})
})
refs is a nested list; the top level has length 2 (the length of splitlist), each of the next levels has length 3 (the lengths of the elements of splitslist), and each of the bottom levels has length 2 (test and reference set).
Related
I'm trying to make a list with 10 elements, each element consisting of 5 * i items drawn from a uniform distribution, i being the ith entry, and I want to use lapply.
Currently I made this function:
z_list <- list()
z_list_generator <- function(n) {
for(i in 1:n){
a <- runif(5 * i)
tmp <- list(a)
mybiglist[[i]] <- tmp
}
mybiglist
}
This function does give the correct outcome when I just put z_list_generator(2), it prints a list with the first element consisting of 5 elements, the second of 10 elements.
What I want to achieve is that I do lapply(some number, z_list_generator) such that it generates this same list, and such that when I do length(lapply(some number, z_list_generator)), the outcome is 'some number'.
Do you mean something like this?
z_list_generator <- function(k) lapply(1:k, function(i) runif(5 * i))
set.seed(2018) # Fixed random seed for reproducibility
z_list_generator(2)
#[[1]]
#[1] 0.33615347 0.46372327 0.06058539 0.19743361 0.47431419
#
#[[2]]
# [1] 0.3010486 0.6067589 0.1300121 0.9586547 0.5468495 0.3956160 0.6645386
# [8] 0.9821123 0.6782154 0.8060278
length(z_list_generator(2))
#[1] 2
Your z_list_generator is strange.
1) You do not initialise mybiglist in your function code. It probably modifies some global variable.
2) You assign mybiglist elements with another list (of lenght 1), which first element contains a sample from a uniform distrubution. Better assign a, not tmp there.
The idea:
Suppose I have a lists with two vectors. Then, I would like to take the first element of the first vector and divide it by the sum of it and the first element of the second vector of the list. Then do that for all elements of the first list. After that, do the same thing but with the second vector of the list.
The code of the lists:
tau1 <- list(c(0.43742669 , 0.64024429, 0.39660069, 0.11849773), c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
My worked code for only two vectors.
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in 1:length(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] / Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]])
}
}
Example:
First element of the list:
TT1 <- tau1[[1]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.4636196
Then for the second element of the list:
TT2 <- tau1[[2]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.5363804
The problem:
I would like to do that for arbitrary number of vectors. For example,
Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]], tau1[[3]][[j]], tau1[[4]][[j]])
How can I do that automatically? any help, please?
If we are using Reduce, then we need to remove the [[i]] to get the sum of corresponding list elements to get a vector. Then subset by the 'j'th index to divide the 'j'th element of 'tau1[[i]]'
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in seq_along(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] /Reduce(`+`, tau1)[j]
}
}
Regarding the error mentioned in the comments, it can happen if there are non-numeric elements. The OP mentioned about NULL elements, but NULL occurs as a single element in a list. So, there is a possibility of character "NULL". For e.g.
tau1 <- list(c(0.43742669 , 0.64024429, "NULL", 0.11849773),
c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
Upon running the code above
Error in f(init, x[[i]]) : non-numeric argument to binary operator
To work with any number of list elements use apply family of functions with [[.
To extract j element in each sub-list use sapply(tau1, "[[", j). To sum those elements use: sum(sapply(tau1, "[[", j))
PS.: Instead of for(j in 1:length(tau1[[1]]){} you should have for(j in 1:length(tau1[[i]]){} - just in case.
Here is a base R one-liner:
lapply(tau1, "/", do.call(mapply, c(FUN = sum, tau1)))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125
Or alternatively (from #lmo's comment):
lapply(tau1, "/", Reduce("+", tau1))
Here is a purrr equivalent:
library(purrr)
tau1 %>% map(`/`, pmap_dbl(., sum))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125
in R I have a list of 100 phlyo objects called called Newick1, Newick2, Newick3, etc. I want to do pairwise comparisons between the trees (e.g. all.equal.phylo(Newick1, Newick2)) but am having difficulty figuring out how to do this efficiently since each file has a different name.
I think something like the for loop below will work, but how do I designate a different file for each iteration of the loop? For obvious reasons the [i] and [j] I put in the code below don't work, but I don't know what to replace them with.
Thank you very much!
for (i in 1:99) {
for (j in i+1:100) {
all.equal.phylo(Newick[i], Newick[j]) -> output[i,j]
} }
try mget() to reference multiple objects by name
> x1 <- x2 <- x3 <-1
> mget(paste0("x",1:3))
$x1
[1] 1
$x2
[1] 1
$x3
[1] 1
You can try a variation on the following:
# make a two column dataframe
# and filter the identical values
df <- expand.grid(1:100,1:100)
names(df) <- c('i','j')
df <- df[!df$i == df$j,]
# example function that takes two parameters
addtwo <- function(i,j){i + j}
# apply that function across rows of the dataframe
results <- mapply(addtwo, df$i, df$j)
# using the same logic,
# your function would look something like this
getdistance <- function(i,j, newicks=NEWICKS) {
all.equal.phylo(newicks[i], newicks[j])
}
# and apply it like this
results <- mapply(getdistance, df$i, df$j)
Key concepts:
expand.grid()
mapply()
I have a list of records:
z <- list(list(a=1),list(a=4),list(a=2))
and I try to add fields to each of them.
Alas, neither
lapply(z,function(l) l$b <- 1+l$a)
nor
for(l in z) l$b <- 1+l$a
modifies z.
In this simple case I can, of course, do
z <- lapply(z,function(l) c(list(b= 1+l$a),l))
but this quickly gets out of hand when the lists have more nesting:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
How do I turn it into
list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
without repeating the definition of the whole structure?
Each element of z has many fields, not just a; and z[[10]]$a has many subfields, not just b.
Your first code example doesn't modify the list because you need to return the list in your call to lapply:
z <- list(list(a=1),list(a=4),list(a=2))
expected <- list(list(a=1, b=2), list(a=4, b=5), list(a=2, b=3))
outcome <- lapply(z,function(l) {l$b <- 1+l$a ; l})
all.equal(expected, outcome)
# [1] TRUE
In the doubly nested example, you could use lapply within lapply, again making sure to return the list in the inner lapply:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
expected <- list(list(a=list(b=1, c=2)), list(a=list(b=4, c=5)), list(a=list(b=2, c=3)))
obtained <- lapply(z, function(l1) { lapply(l1, function(l2) {l2$c = l2$b+1 ; l2 } )})
all.equal(expected, obtained)
# [1] TRUE
Another, somewhat convoluted, option:
z <- list(list(a=1),list(a=4),list(a=2))
res <- list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
res1 <- rapply(z,function(x) list(b = x,c = x+1),how = "replace")
> all.equal(res,res1)
[1] TRUE
I only say convoluted because rapply can be tricky to use at times (for me at least).
I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3