I have three lists with 2 elements each. How do I check whether every element has the same length in every list? Preferably using purrr. Thank you!
list.a = list(a = 1, b = c(1, 2))
list.b = list(a = 2, b = c(1, 2))
list.c = list(a = 3, b = c(1, 2, 3))
Should return T, T, F.
Not entirely sure on the requirement but here's 2 potentially usefully snippets.
map_lgl(transpose(list(list.a, list.b, list.c)), ~ var(lengths(.x))==0)
a b
TRUE FALSE
or for a more general output you can manipulate easier
map_dfr(list(list.a, list.b, list.c), ~map(.x, length))
a b
1 1 2
2 1 2
3 1 3
For any two lists:
all(lengths(list.a)==lengths(list.b))
To check if all lists are equal:
same_length <- function (x, y) all(lengths(x) == lengths(y))
Reduce(f, list(list.a, list.b, list.c))
If you want to use purrr:
same_length <- function (x, y) all(lengths(x) == lengths(y))
purrr::reduce(list(list.a, list.b, list.c), f)
each element has the same length in every list? - This would be a single TRUE or FALSE. Based on your expected output and task I think you want to compare for specific length of list elements.
master_list <- list(list.a, list.b, list.c)
map_lgl(master_list, ~ all(lengths(.x) == 1:2))
[1] TRUE TRUE FALSE
Related
Can I pass a custom compare function to order that, given two items, indicates which one is ranked higher?
In my specific case I have the following list.
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
If we take two vectors a and b, the index of the first element i at which a[i] > b[i] or a[i] < b[i] should determine what vector comes first. In this example, scores[['d']] > scores[['a']] because scores[['d']][2] > scores[['a']][2] (note that it doesn't matter that scores[['d']][5] < scores[['a']][5]).
Comparing two of those vectors could look something like this.
compare <- function(a, b) {
# get first element index at which vectors differ
i <- which.max(a != b)
if(a[i] > b[i])
1
else if(a[i] < b[i])
-1
else
0
}
The sorted keys of scores by using this comparison function should then be d, b, a, c.
From other solutions I've found, they mess with the data before ordering or introduce S3 classes and apply comparison attributes. With the former I fail to see how to mess with my data (maybe turn it into strings? But then what about numbers above 9?), with the latter I feel uncomfortable introducing a new class into my R package only for comparing vectors. And there doesn't seem to be a sort of comparator parameter I'd want to pass to order.
Here's an attempt. I've explained every step in the comments.
compare <- function(a, b) {
# subtract vector a from vector b
comparison <- a - b
# get the first non-zero result
restult <- comparison[comparison != 0][1]
# return 1 if result == 1 and 2 if result == -1 (0 if equal)
if(is.na(restult)) {return(0)} else if(restult == 1) {return(1)} else {return(2)}
}
compare_list <- function(list_) {
# get combinations of all possible comparison
comparisons <- combn(length(list_), 2)
# compare all possibilities
results <- apply(comparisons, 2, function(x) {
# get the "winner"
x[compare(list_[[x[1]]], list_[[x[2]]])]
})
# get frequency table (how often a vector "won" -> this is the result you want)
fr_tab <- table(results)
# vector that is last in comparison
last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
# return the sorted results and add the last vectors name
c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)
}
If you run the function on your example, the result is
> compare_list(scores)
[1] 4 2 1 3
I haven't dealt with the case that the two vectors are identical, you haven't explained how to deal with this.
The native R way to do this is to introduce an S3 class.
There are two things you can do with the class. You can define a method for xtfrm that converts your list entries to numbers. That could be vectorized, and conceivably could be really fast.
But you were asking for a user defined compare function. This is going to be slow because R function calls are slow, and it's a little clumsy because nobody does it. But following the instructions in the xtfrm help page, here's how to do it:
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
# Add a class to the list
scores <- structure(scores, class = "lexico")
# Need to keep the class when subsetting
`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")
# Careful here: identical() might be too strict
`==.lexico` <- function(a, b) {identical(a, b)}
`>.lexico` <- function(a, b) {
a <- a[[1]]
b <- b[[1]]
i <- which(a != b)
length(i) > 0 && a[i[1]] > b[i[1]]
}
is.na.lexico <- function(a) FALSE
sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#>
#> $a
#> [1] 1 1 2 3 4 4
#>
#> $b
#> [1] 1 2 2 2 3 4
#>
#> $d
#> [1] 1 2 3 3 3 4
#>
#> attr(,"class")
#> [1] "lexico"
Created on 2021-11-27 by the reprex package (v2.0.1)
This is the opposite of the order you asked for, because by default sort() sorts to increasing order. If you really want d, b, a, c use sort(scores, decreasing = TRUE.
Here's another, very simple solution:
sort(sapply(scores, function(x) as.numeric(paste(x, collapse = ""))), decreasing = T)
What it does is, it takes all the the vectors, "compresses" them into a single numerical digit and then sorts those numbers in decreasing order.
let's assume we have 4 vectors
a <- c(200,204,209,215)
b <- c(215,220,235,245)
c <- c(230,236,242,250)
d <- c(240,242,243,267)
I basically want to create a loop which creates the differentials between each pair, and then calculate the Z scores for those differentials. So something like scale(d-a). How do I create the loop that basically goes scale(b-a), then scale(c-a), scale(d-a) etc? many thanks.
Single named variables don't lend themselves too well to "looping".
Let's use a list() of vectors instead:
vecs <- list(
a = c(200,204,209,215),
b = c(215,220,235,245),
c = c(230,236,242,250),
d = c(240,242,243,267)
)
This allows us to apply a function to all pairs using combn
scale_diff <- function(subset) {
z <- scale(subset[[1]] - subset[[2]])
colnames(z) <- paste(names(subset), collapse = " - ")
z
}
z_scores <- combn(vecs, 2, scale_diff, simplify = FALSE)
Now z_scores is a list of 6 matrices (column vectors). The column names show you which vectors were subtracted before scaling.
We can place it in a list and use combn to get the combinations and then apply the difference
lst1 <- list(a = a, b = b, c = c, d = d)
out <- combn(lst1, 2, FUN = function(x) scale(Reduce(`-`, x))[,1])
colnames(out) <- combn(names(lst1), 2, FUN = paste, collapse='_')
out
# a_b a_c a_d b_c b_d c_d
#[1,] 0.9108601 1.2009612 0.1290994 -0.7643506 -0.753390 -0.2219686
#[2,] 0.7759179 0.2401922 0.3872983 -0.9441978 -0.360317 0.3699477
#[3,] -0.5735045 -0.2401922 0.9036961 0.6744270 1.474024 1.1098432
#[4,] -1.1132735 -1.2009612 -1.4200939 1.0341214 -0.360317 -1.2578222
As #AlexR mentioned in the comments, if the attributes are important, then remove [,1] and keep it as a matrix of 1 column
out <- combn(lst1, 2, FUN = function(x) scale(Reduce(`-`, x)), simplify = FALSE)
I have similar issue like in this questions Compare every 2 rows and show mismatches in R
I would like to compare not only 2 rows but for example 3, 4, etc.
I have a data.table here:
DT <- data.table(A = rep(1:2, 2), B = rep(1:4, 2),
C = rep(1:2, 1), key = "A")
Then I use
dfs <- split(DT, DT$A)
comp <- function(x) sapply(x, function(u) u[1]==u[2])
matches <- sapply(dfs, comp)
For 3 rows :
comp <- function(x) sapply(x, function(u) u[1]==u[2] & u[1]==u[3])
Is that accurate? How can I generalize it in more elegant way?
try this:
comp2 <- function(dt, i, rws){
k <- length(rws)
tmp <- as.numeric(dt[i])
tmp <- as.data.table(matrix(rep(tmp, k), nrow = k, byrow = TRUE, dimnames = list(NULL, colnames(dt))))
ans <- (dt[rws] == tmp)
ans
}
this function takes three arguments:
-> dt your data.table (or sub-data.tables obtained from splitting your original one, up to you)
-> i -- row you want to compare
-> rws -- vector of row numbers you want to compare i with (e.g. c(2,3,4) would compare i with rows 2, 3 and 4
it then creates a new data.table that consists of row i stacked k times, so a data.frame to data.frame comparison is possible.
example:
comp2(DT, 1, c(2, 3, 4))
# A B C
#[1,] TRUE FALSE TRUE
#[2,] FALSE FALSE FALSE
#[3,] FALSE FALSE FALSE
compares row 1 of your data.table DT to rows 2, 3 and 4.
if you want your output to tell you whether your chosen row differs from at least one of the rows you are comparing it to, then you need an extra operation colSums(ans) == k instead of ans.
I am having a relatively simple problem with R, which I hope we could find a solution to.
My aim is to define a following list, in which the c element should be the sum of a and b elements defined previously:
ex.list = list(
a = 1,
b = 2,
c = a+b
)
Code throws an error (Error: object 'a' not found), indicating that we cannot use the a and b elements defined just above.
Of course we can simply count the sum out of list definition
ex.list = list(
a = 1,
b = 2
)
ex.list$c = ex.list$a + ex.list$b
Or use another elements in creating the list
a.ex = 1
b.ex = 2
ex.list = list(
a = a.ex,
b = b.ex,
c = a.ex+b.ex
)
Unfortunately, I am not interested in the above solutions. Is there any way to do the sum in the list definition?
You can write your own list function that does lazy evaluation:
lazyList <- function(...) {
tmp <- match.call(expand.dots = FALSE)$`...`
lapply(tmp, eval, envir = tmp)
}
lazyList(
a = 1,
b = 2,
c = a+b
)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#[1] 3
However, obviously, the following is not possible with lazy evaluation:
lazyList(
a = 1,
b = 2,
d = c * a,
c = a+b
)
No, you can't do that. But you can do mad things like this:
> (function(a,b,c=a+b){list(a=a,b=b,c=c)})(11,22)
$a
[1] 11
$b
[1] 22
$c
[1] 33
But really, if you have a list you wish to construct in a particular way, write a function to do it. Its not difficult.
I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML
One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.
Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55
This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3