Better way to apply this function to each row a data frame? - r

I'd like to apply a function to each row of a data frame, as below. I know how to use apply in the case where the data frame contains only numbers, but what if the rows contain, say, booleans / logicals, strings and integers? Example:
df <- data.frame(x=1:10,
RowFunction <- function(row) {
if (row$y) return(row$x)
return (row$z)
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
Is there a better way to do this? My first thought was to use apply(df, 1, RowFunction) after adding row <- as.list(row) to the beginning of RowFunction, but this doesn't work because apply coerces df into an array, which can't handle rows containing different data types.
Just for my R knowledge, I'd like to know if there is a cleaner way to do this than sapply(1:dim(df)[1], ... ). Any ideas?
Thanks in advance!

In this case, you can simply use ifelse:
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
with(df, ifelse(y, x, z))
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
For convenience and readability I also used with - this allows you to refer to a column just by name, without using the $ operator.

The ifelse function can do it with lapply:
lapply(df$y, ifelse, df$x, df$z) # does return list with varying modes
My earlier (more clunky) version:
res <- list()
for(i in seq_along(rownames(df) ) ) { res <- c(res, df[i,1+2*!df[i,"y"] ]) }
[1] 1
[1] "b"
[1] 3
[1] "d"
[1] 5
[1] "f"
[1] 7
[1] "h"
[1] 9
[1] "j"


Processing nested lists in nested for loop

I have 2 variables, and I need to create all combinations using these 2 variables. I have been able to achieve this using R combn function, and finally store the combinations within a nested list. Now I need to run some calculation for each combination and store the combined output together. I am trying to store the output in a list but for some reason the output list is not being generated the correct way. Below is an example code:
input_variables <- c("a","b")
output_sublist <- list()
output_biglist <- list()
input_combination_list <- list()
for (i in 1:length(input_variables)) {
input_combination_list[[i]] <- combn(input_variables, i, simplify = FALSE)
for(j in 1:length(input_combination_list[[i]])) {
output_sublist[[j]] <- input_combination_list[[i]][[j]]
output_biglist[[i]] <- output_sublist
The output that I get is:
[1] "a"
[1] "b"
[1] "a" "b"
[1] "b"
What I would like to have is:
[1] "a"
[1] "b"
[1] "a" "b"
I am not sure why there is an extra "b" in the end!! Any help would be greatly appreciated. Thanks a lot in advance.
output_sublist for i = 1 is
#[1] "a"
#[1] "b"
For i = 2, since we don't clear output_sublist it replaces only the first value and second value remains as it is.
#[1] "a" "b"
#[1] "b"
You need to clear output_sublist after each iteration of i.
for (i in 1:length(input_variables)) {
output_sublist <- list() #Added a line here to clear output_sublist
input_combination_list[[i]] <- combn(input_variables, i, simplify = FALSE)
for(j in 1:length(input_combination_list[[i]])) {
output_sublist[[j]] <- input_combination_list[[i]][[j]]
output_biglist[[i]] <- output_sublist
#[1] "a"
#[1] "b"
#[1] "a" "b"
However, as mentioned in the comments we can do this with lapply as well
lapply(seq_along(input_variables), function(x)
combn(input_variables, x, simplify = FALSE))
#[1] "a"
#[1] "b"
#[1] "a" "b"

R doubling list length when subsetting

I am currently trying to subset a list in R from a dataframe. My current attempt looks like:
list.level <- unique(buckets$group) <- vector("list",length(list.level))
for(i in list.level){[[i]] <- subset(buckets$group,buckets$group == i)
However, instead of filling the list it seems to create a duplicate list of the same amount of rows, returning:
[1] "A"
[1] "C" "C" "C"
[1] "D" "D" "D"
[1] "AJ" "AJ" "AJ" "AJ" "AJ"
[1] "AK" "AK"
A should be filling into 1, C into 2, etc. etc. How do I get these to fill in the original rows rather than creating extra rows at the bottom of the list?
Here is what is going on. Suppose your buckets$group is c("a","a","b","b").
list.level <- unique(buckets$group)
Now list.level is c("a","b") <- vector("list",length(list.level))
Since length(list.level) is 2, now your is a list of 2 NULL elements, their names are 1 and 2.
for(i in list.level){
Recalling the value of list.level, it is the same as for i in c("a","b").[[i]] <- subset(buckets$group,buckets$group == i)
Since i loops over "a" and "b", you now fill[["a"]] and[["b"]], while[[1]] and[[2]] remain intact.
To fix this, you should write instead
list.level <- unique(buckets$group) # ok, this was correct <- list() # just empty list
for(i in 1:length(list.level)){[[i]] <- buckets$group[buckets$group == list.level[[i]] ]
I think the issue is with your for statement.
Your code is like this:
> for(i in list.level) print(i)
[1] "a"
[1] "b"
[1] "c"
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
It assigns each element in list.level to i, so i is a letter. When you do[[i]] <- subset(buckets$group,buckets$group == i)
in the first iteration, i is a letter. So it looks for a list element called[["a"]] and does not find it, so it creates it and stores the data there. If instead you use seq_along
for(i in seq_along(list.level)) print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
now i will alway be a number and the code will do what you want.
So use seq_along instead.
this should work:
list.level <- unique(buckets$group) <- vector("list",length(list.level))
for(i in 1:length(list.level)){[[i]] <- subset(buckets$group,buckets$group == list.level[i])

How do I apply an index vector over a list of vectors?

I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[1] "b"
[1] "f"
[1] "b"
[1] "f"
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[1] "b" "c"
[1] "e" "f"
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
#[1] "b"
#[1] "f"
#[1] "b" "f"

R:Replace all values in a list according to rules in other dataframe

I would like to replace all values in a list (An) with their respective values in another dataframe.
I've created a simple example above with only 3 elements, but I have a much lager and more diverse list in my data (hence a simple ifelse recode would be too long) . So here I want "a" to be replaced with 1, b with 2 and c with 3 across the whole list because these are the "rules" found in LK.
Is there a way to tell R: look at each element in An, find a match in LK$Strn and replace An with LK$String ?
So the beginning of resulting list will be
[1] "a" "b" "c"
[1] "a" "c" "b"
Obviously the full resulting list will be the same size as An.
I've tried match() but I must be doing something wrong...
Any help would be greatly appreciated.
You can do it with a quick lapply like so...
res <- lapply( An , function(x){ x <- as.character( LK[ match( x , LK$Strn ) , "String" ] ) } )
# [[1]]
# [1] "a" "b" "c"
# [[2]]
# [1] "a" "c" "b"
# [[3]]
# [1] "c" "a" "b"
# [[4]]
# [1] "c" "b" "a"
# [[5]]
# [1] "b" "c" "a"
# [[6]]
# [1] "b" "a" "c"

how to split a vector with mixed variables into two separate vectors in R

I extracted a mixed variable which includes both numeric and string values from a data file using strsplit function. I ended up with a variable just as seen below:
> sample3
[1] "1200" "A"
[1] "1193" "A"
[1] "1117" "B"
[1] "5663"
[1] "7003" "C"
[1] "1205" "A"
[1] "2100" "D"
[1] "1000" "D"
[1] "D"
[1] "1000" "B"
I need to split this into two variables/vectors(or convert to a two-columned matrix). I tried to use unlist(sample3) code then put the all values into a matrix with ncol=2 however since there are some missing data points the result is not correct when I use this way. I think I need to solve missing data issue before putting into a two columned matrix. Does anyone have any idea on this issue? Any help will be greatly appreciated.
Something like this will work
# dummy data
x <- list(c('100','a'), '100', c('a'), c('1000','b'))
numeric_x <- unlist(lapply(x,function(x) {.x <- head(x,1); as.numeric(.x)}))
character_x <- unlist(lapply(x,function(x) {.x <- tail(x,1); if( {return(.x)} else {return(NA)}}))
There will be a much nicer regex answer I am sure
