Pass function through specific columns with lapply or for loop - r

I have created a function that reorganizes a data frame into a list. I want to pass the function through all of the columns in the data frame (excluding the first 2 columns) however, the lapply function is returning strange results.
Here is a reproducible example:
names <- c("A", "B", "C", "D")
titles <- c("P", "S", "S", "P")
day1 <- c(1,0,1,0)
day2 <- c(0,0,1,1)
day3 <- c(1,1,0,0)
df <- data.frame(names, titles, day1, day2, day3)
ids <-df[,1:2]
obs <- df[,3:5]
I create the function which searches each "day column" for a 1 or a 0 and reports the "name" and "title" of a row with a 0 (it also removes duplicated values).
group_maker1 <- function(x){
g1 <- ids$names[obs[,x]> 0]
g2 <- ids$titles[obs[,x]> 0]
temp <- c(g1,g2)
temp <- temp[!duplicated(temp)]
paste(temp)
}
#test group_maker
> group_maker1(3)
[1] "A" "B" "P" "S"
In the actual data frame, there are many (>300) columns of "days". I want to pass this group_maker function through each column of "days" to the nth day.
I've tried running it through a for loop but the output doesn't seem to store anywhere
for(i in 1:nrow(df)) { # for-loop over columns
group_maker1 <- function(x){
g1 <- ids$names[obs[,x]> 0]
g2 <- ids$titles[obs[,x]> 0]
temp <- c(g1,g2)
temp <- temp[!duplicated(temp)]
paste(temp)
}
}
Alternatively, I tried lapply, which seems more promising as it gives an output, however "NA"'s are present, and its not reporting any of the "B" names
lapply(obs[,1:3], group_maker1)
$day1
[1] "A" "C" "NA" "P" "S"
$day2
[1] "A" "C" "NA" "P" "S"
$day3
[1] "A" "C" "NA" "P" "S"
This is the desired output, however the values within it are incorrect. I want it to return the output as seen above in the group_maker1(3) line but with the correct values for each column of days (i.e. no "NA's" and all of the values in that column)
Essentially, I want the loop/apply to pass the function through each column of "days" and provide an output of all the "names" and "titles" for each day in the form of a list.

Using your test data, we have
> group_maker1(1)
[1] "A" "C" "P" "S"
> group_maker1(2)
[1] "C" "D" "S" "P"
> group_maker1(3)
[1] "A" "B" "P" "S"
So, we can replicate using a for loop with
> for(i in 1:3) print(group_maker1(i))
[1] "A" "C" "P" "S"
[1] "C" "D" "S" "P"
[1] "A" "B" "P" "S"
or using lapply with
> lapply(1:3, group_maker1)
[[1]]
[1] "A" "C" "P" "S"
[[2]]
[1] "C" "D" "S" "P"
[[3]]
[1] "A" "B" "P" "S"
In both cases, your attempt failed because of a simple typo.
Or, taking a completely different approach to avoid the explicit use of loops altogether
library(tidyverse)
df %>%
pivot_longer(
starts_with("day"),
names_to="col",
values_to="val"
) %>%
group_by(col) %>%
group_map(
function(.x, .y) {
z <- .x %>% filter(val > 0)
c(z %>% pull(names) %>% unique(), z %>% pull(titles) %>% unique())
}
)
[[1]]
[1] "A" "C" "P" "S"
[[2]]
[1] "C" "D" "S" "P"
[[3]]
[1] "A" "B" "P" "S"
This final option could be shorter if there were no need to deal with awkward input and output formats.

Related

How to code to get an output vector list of unique elements based satisfying two conditions?

I'm trying to get list of uniques elements based on conditions of two columns in R.
For example, I have 4 groups and I want to get unique list of names of participants who are in group-1.
This requires to specify the two conditions in the code:
Unique(df$participants XXX_group_XXX).
How to code this condition specifically to get the output vecort list satisfying both conditions?
A simple solution using only base R:
set.seed(7*11*13)
name <- sample(LETTERS[1:10], 100, replace=TRUE)
G <- sample(1:5, 100, replace=TRUE)
U <- tapply(name, G, unique)
> U
$`1`
[1] "G" "F" "D" "B" "J" "A" "E" "H" "C"
$`2`
[1] "C" "J" "D" "B" "F" "G"
$`3`
[1] "C" "G" "H" "D" "F" "E" "I" "B" "J"
$`4`
[1] "F" "B" "G" "E" "I" "C" "H" "D" "J"
$`5`
[1] "G" "D" "A" "H" "F" "E" "B" "J" "C"
Would this work for you? I need to create a data frame first. Then I filter for the group you wish to see and get the unique values per group.
library(dplyr)
seed <- 123
# create some data
data <- data.frame(
name = sample(LETTERS, size = 100, replace = TRUE),
group = sample(c(1, 2, 3, 4), size = 100, replace = TRUE)
)
# base R
unique(data[data$group == 1, 1])
# or:
unique(data[data$group == 1, "name"])
# tidyverse
data %>%
filter(group == 1) %>%
distinct(name) %>%
pull() # if you want a vector to be returned

Trouble evaluating combinations from combn using purrr

I am trying to use combn to divide a group of n = 20 different units into 3 groups of unequal size -- 4, 6 and 10. Then I am trying to validate for values that must be together within a group -- if one element from the pair exists in the group then the other should also be in the group. If one is not in the group then neither should be in the group. In this fashion, I'd like to evaluate the groups in order to find all possible valid solutions where the rules are true.
x <- letters[1:20]
same_group <- list(
c("a", "c"),
c("d", "f"),
c("b", "k", "r")
)
combinations_list <- combn(x, 4, simplify = F)
validate_combinations <- function(x) all(c("a", "c") %in% x) | !any(c("a", "c") %in% x)
valid_combinations <- keep(combinations_list, validate_combinations)
In this way I'd like to combine -> reduce each group until I have a list of all valid combinations. I'm not sure how to combine combinations_list, validate_combinations, and the same_group to check all same_group "rules" against the combinations in the table. The furthest I can get is to check against one combination c("a", "c"), which when run against keep(combinations_list, validate_combinations) is indeed giving me the output I want.
I think once I can do this, I can then use the unpicked values in another combn function for the group of 6 and the group of 10.
We can change the function to accept variable group
validate_combinations <- function(x, group) all(group %in% x) | !any(group %in% x)
then for each group subset the combinations_list which satisfy validate_combinations
lapply(same_group, function(x) combinations_list[
sapply(combinations_list, function(y) validate_combinations(y, x))])
#[[1]]
#[[1]][[1]]
#[1] "a" "b" "c" "d"
#[[1]][[2]]
#[1] "a" "b" "c" "e"
#[[1]][[3]]
#[1] "a" "b" "c" "f"
#[[1]][[4]]
#[1] "a" "b" "c" "g"
#[[1]][[5]]
#[1] "a" "b" "c" "h"
#[[1]][[6]]
#[1] "a" "b" "c" "i"
#[[1]][[7]]
#[1] "a" "b" "c" "j"
#[[1]][[8]]
#[1] "a" "b" "c" "k"
#......

Reorder a vector with wrap around in R

Let's say I have a simple vector x in R. It is in the order 'a','b','c','d'. Is there a function that would take the vector and reorder it with wrap around? For example, how can I get x to be 'c','d','a','b'?
#Vector x
> x <- letters[1:4]
> x
[1] "a" "b" "c" "d"
#What I want:
> somefcn(x, 3)
[1] "c" "d" "a" "b"
x <- letters[1:4]
shiftnum <- 3
c(x[shiftnum:length(x)],x[1:shiftnum-1])
[1] "c" "d" "a" "b"
Is a very rough way to do, but it works

R: Non-greedy version of setdiff?

Here's setdiff normal behaviour:
x <- rep(letters[1:4], 2)
x
# [1] "a" "b" "c" "d" "a" "b" "c" "d"
y <- letters[1:2]
y
# [1] "a" "b"
setdiff(x, y)
# [1] "c" "d"
… but what if I want y to be taken out only once, and therefore get the following result?
# "c" "d" "a" "b" "c" "d"
I'm guessing that there is an easy solution using either setdiff or %in%, but I just cannot see it.
match returns a vector of the positions of (first) matches of its first argument in its second. It's used as an index constructor:
x[ -match(y,x) ]
#[1] "c" "d" "a" "b" "c" "d"
If there are duplicates in 'y' and you want removal in proportion to their numbers therein, then the first thing that came to my mind is a for-loop:
y <- c("a","b","a")
x2 <- x
for( i in seq_along(y) ){ x2 <- x2[-match(y[i],x2)] }
> x2
[1] "c" "d" "b" "c" "d"
This would be one possible result of using the tabling approach suggested below. Uses some "set" functions, but this is not really a set problem. Seems somewhat more "vectorised":
c( table(x [x %in% intersect(x,y)]) - table(y[y %in% intersect(x,y)]) ,
table( x[!x %in% intersect(x,y)]) )
a b c d
0 1 2 2
vecsets package has vsetdiff function for this.
x <- rep(letters[1:4], 2)
y <- letters[1:2]
vecsets::vsetdiff(x, y)
#[1] "c" "d" "a" "b" "c" "d"
Here is another looping method. I think 42's method is cleaner, but it provides another option.
# construct a table containing counts for all possible values in x and y in y
myCounts <- table(factor(y, levels=sort(union(x, y))))
# extract these elements from x
x[-unlist(lapply(names(myCounts),
function(i) which(i == x)[seq_len(myCounts[i])]))]
The "non-greedy" aspect comes from [seq_len(myCounts[i])] which only takes the number of identical elements that are present in y

R construct summary of values from columns

I would like to make an array that summarises the rows of a data frame with the unique values contained within said rows.
with sample the following example code:
ref <- c(1:8)
data1 <- c("A","","C","","","","A","")
data2 <- c("A","","","A","C","","","")
data3 <- c("","B","","","","","","B")
data4 <- c("A","B","","","","D","A","")
initial.data <- data.frame(ref, data1, data2, data3, data4)
I can obtain what I want with:
summary.data <- paste(initial.data[,2], initial.data[,3],
initial.data[,4], initial.data[,5], sep='')
desired.data <- substring(summary.data,1,1)
However, I would like a more parsimonious way of coding this and one that does not assume that each row may only take one value.
You can try
apply(initial.data[-1],1, function(x) unique(x[x!='']))
#[1] "A" "B" "C" "A" "C" "D" "A" "B"
Or
substr(do.call(paste0, initial.data[-1]),1,1)
#[1] "A" "B" "C" "A" "C" "D" "A" "B"
Or use max.col
initial.data[cbind(1:nrow(initial.data),max.col(initial.data[-1]!='')+1)]
#[1] "A" "B" "C" "A" "C" "D" "A" "B"

Resources