R doubling list length when subsetting

R doubling list length when subsetting - r

I am currently trying to subset a list in R from a dataframe. My current attempt looks like:
list.level <- unique(buckets$group)
bucket.group <- vector("list",length(list.level))
for(i in list.level){
bucket.group[[i]] <- subset(buckets$group,buckets$group == i)
}
However, instead of filling the list it seems to create a duplicate list of the same amount of rows, returning:
[[1]]
NULL
[[2]]
NULL
...
NULL
[[22]]
NULL
[[23]]
NULL
$A
[1] "A"
$C
[1] "C" "C" "C"
$D
[1] "D" "D" "D"
...
$AJ
[1] "AJ" "AJ" "AJ" "AJ" "AJ"
$AK
[1] "AK" "AK"
A should be filling into 1, C into 2, etc. etc. How do I get these to fill in the original rows rather than creating extra rows at the bottom of the list?

Here is what is going on. Suppose your buckets$group is c("a","a","b","b").
list.level <- unique(buckets$group)
Now list.level is c("a","b")
bucket.group <- vector("list",length(list.level))
Since length(list.level) is 2, now your bucket.group is a list of 2 NULL elements, their names are 1 and 2.
for(i in list.level){
Recalling the value of list.level, it is the same as for i in c("a","b").
bucket.group[[i]] <- subset(buckets$group,buckets$group == i)
Since i loops over "a" and "b", you now fill bucket.group[["a"]] and bucket.group[["b"]], while bucket.group[[1]] and bucket.group[[2]] remain intact.
To fix this, you should write instead
list.level <- unique(buckets$group) # ok, this was correct
bucket.group <- list() # just empty list
for(i in 1:length(list.level)){
bucket.group[[i]] <- buckets$group[buckets$group == list.level[[i]] ]
}

I think the issue is with your for statement.
Your code is like this:
list.level<-letters[1:10]
> for(i in list.level) print(i)
[1] "a"
[1] "b"
[1] "c"
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
It assigns each element in list.level to i, so i is a letter. When you do
bucket.group[[i]] <- subset(buckets$group,buckets$group == i)
in the first iteration, i is a letter. So it looks for a list element called bucket.group[["a"]] and does not find it, so it creates it and stores the data there. If instead you use seq_along
for(i in seq_along(list.level)) print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
now i will alway be a number and the code will do what you want.
So use seq_along instead.

this should work:
list.level <- unique(buckets$group)
bucket.group <- vector("list",length(list.level))
for(i in 1:length(list.level)){
bucket.group[[i]] <- subset(buckets$group,buckets$group == list.level[i])
}

Related

R for loops: when to use i in seq_along(x) and when to use i in x

I am very new to R and got stuck on writing for loops. Sometime I see people write: for (i in seq_along(x)), while other times they write for (i in x). What is the difference between the two? Does it depend on the properties of x? Help appreciated!

Consider the following vector x:
x <- LETTERS[1:5]
x
[1] "A" "B" "C" "D" "E"
If you perform a for loop on x you are using the values of x:
for(i in x) print(i)
[1] "A"
[1] "B"
[1] "C"
[1] "D"
[1] "E"
If instead you use seq_along, you are creating an integer sequence of the same length as x:
for(i in seq_along(x)) print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Which one is appropriate for which situation is dependent on what you're ultimately trying to do. However, I frequently find myself using seq_along because it is trivial to subset x with i, but finding the index of x of i is more typing.
for(i in seq_along(x)) print(x[i])
[1] "A"
[1] "B"
[1] "C"
[1] "D"
[1] "E"
Another approach you sometimes might see is using 1:length(x). However, as #GregorThomas points out, this can cause unexpected behavior.
Consider the following empty vector y:
y <- vector()
for(i in seq_along(y)) print(1+i)
This results in no output because seq_along(y) evaluates to a zero-length vector.
In contrast, consider 1:length(y):
for(i in 1:length(y)) print(1+i)
[1] 2
[1] 1
This is because 1:length(y) evaluates to c(1,0).

Processing nested lists in nested for loop

I have 2 variables, and I need to create all combinations using these 2 variables. I have been able to achieve this using R combn function, and finally store the combinations within a nested list. Now I need to run some calculation for each combination and store the combined output together. I am trying to store the output in a list but for some reason the output list is not being generated the correct way. Below is an example code:
''''
input_variables <- c("a","b")
output_sublist <- list()
output_biglist <- list()
input_combination_list <- list()
for (i in 1:length(input_variables)) {
input_combination_list[[i]] <- combn(input_variables, i, simplify = FALSE)
for(j in 1:length(input_combination_list[[i]])) {
input_combination_list[[i]][[j]]
output_sublist[[j]] <- input_combination_list[[i]][[j]]
}
output_biglist[[i]] <- output_sublist
}''''
The output that I get is:
[[1]]
[[1]][[1]]
[1] "a"
[[1]][[2]]
[1] "b"
[[2]]
[[2]][[1]]
[1] "a" "b"
[[2]][[2]]
[1] "b"
What I would like to have is:
[[1]]
[[1]][[1]]
[1] "a"
[[1]][[2]]
[1] "b"
[[2]]
[[2]][[1]]
[1] "a" "b"
I am not sure why there is an extra "b" in the end!! Any help would be greatly appreciated. Thanks a lot in advance.

output_sublist for i = 1 is
#[[1]]
#[1] "a"
#[[2]]
#[1] "b"
For i = 2, since we don't clear output_sublist it replaces only the first value and second value remains as it is.
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "b"
You need to clear output_sublist after each iteration of i.
for (i in 1:length(input_variables)) {
output_sublist <- list() #Added a line here to clear output_sublist
input_combination_list[[i]] <- combn(input_variables, i, simplify = FALSE)
for(j in 1:length(input_combination_list[[i]])) {
input_combination_list[[i]][[j]]
output_sublist[[j]] <- input_combination_list[[i]][[j]]
}
output_biglist[[i]] <- output_sublist
}
output_biglist
#[[1]]
#[[1]][[1]]
#[1] "a"
#[[1]][[2]]
#[1] "b"
#[[2]]
#[[2]][[1]]
#[1] "a" "b"
However, as mentioned in the comments we can do this with lapply as well
lapply(seq_along(input_variables), function(x)
combn(input_variables, x, simplify = FALSE))
#[[1]]
#[[1]][[1]]
#[1] "a"
#[[1]][[2]]
#[1] "b"
#[[2]]
#[[2]][[1]]
#[1] "a" "b"

Delete elements appearing before one element and itself

I have a list of elements (letter here in the example)
(l <- list(letters[1:2], letters[2:3]))
# [[1]]
# [1] "a" "b"
# [[2]]
# [1] "b" "c"
And another elements
(r <- letters[2])
# [1] "b"
The R function must delete evrything before "b" and "b" itself.
So the result will be like this :
# [[1]]
# [1] "c"
Any idea please?
Thank you in advance

Try
out = lapply(l, function(x) x[-c(1,which(x == "b"))])
Filter(length, out)
#[[1]]
#[1] "c"
or as #akrun suggested
Filter(length,lapply(l, function(x) x[-seq(match("b",x))]))

How do I apply an index vector over a list of vectors?

I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[[1]]
[1] "b"
[[2]]
[1] "f"
##or
[1] "b"
[1] "f"
##or
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
##or
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[[1]]
[1] "b" "c"
[[2]]
[1] "e" "f"
##or
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Scott

Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
Map("[",my.list,my.index)
#[[1]]
#[1] "b"
#
#[[2]]
#[1] "f"
..or:
mapply("[",my.list,my.index)
#[1] "b" "f"

Better way to apply this function to each row a data frame?

I'd like to apply a function to each row of a data frame, as below. I know how to use apply in the case where the data frame contains only numbers, but what if the rows contain, say, booleans / logicals, strings and integers? Example:
df <- data.frame(x=1:10,
y=c(TRUE, FALSE),
z=letters[1:10],
stringsAsFactors=FALSE)
RowFunction <- function(row) {
if (row$y) return(row$x)
return (row$z)
}
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
Is there a better way to do this? My first thought was to use apply(df, 1, RowFunction) after adding row <- as.list(row) to the beginning of RowFunction, but this doesn't work because apply coerces df into an array, which can't handle rows containing different data types.
Just for my R knowledge, I'd like to know if there is a cleaner way to do this than sapply(1:dim(df)[1], ... ). Any ideas?
Thanks in advance!

In this case, you can simply use ifelse:
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
with(df, ifelse(y, x, z))
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
For convenience and readability I also used with - this allows you to refer to a column just by name, without using the $ operator.

The ifelse function can do it with lapply:
lapply(df$y, ifelse, df$x, df$z) # does return list with varying modes
My earlier (more clunky) version:
res <- list()
for(i in seq_along(rownames(df) ) ) { res <- c(res, df[i,1+2*!df[i,"y"] ]) }
res
#--------
[[1]]
[1] 1
[[2]]
[1] "b"
[[3]]
[1] 3
[[4]]
[1] "d"
[[5]]
[1] 5
[[6]]
[1] "f"
[[7]]
[1] 7
[[8]]
[1] "h"
[[9]]
[1] 9
[[10]]
[1] "j"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R doubling list length when subsetting - r

this should work: list.level <- unique(buckets$group) bucket.group <- vector("list",length(list.level)) for(i in 1:length(list.level)){ bucket.group[[i]] <- subset(buckets$group,buckets$group == list.level[i]) }

Related

R for loops: when to use i in seq_along(x) and when to use i in x

Processing nested lists in nested for loop

Delete elements appearing before one element and itself

How do I apply an index vector over a list of vectors?

Better way to apply this function to each row a data frame?

Categories

Resources