Conditional selection of elements of a list in Base R - r

I'm trying to find the unique elements in the variables listed as x.
The only constraint is that I want to first find the variable (here either a, b, or c) in the list whose max element is smallest, and keep that variable untouched at the top of the output?
I have tried something but can't implement the constraint above:
P.S. My goal is to achieve a function/looping structure to handle larger lists.
x = list(a = 1:5, b = 3:7, c = 6:9) ## a list of 3 variables; variable `a` has the smallest
## max among all variables in the list, so keep `a`
## untouched at the top of the output.
x[-1] <- Map(setdiff, x[-1], x[-length(x)]) ## Now, take the values of `b` not shared
## with `a`, AND values of `c` not shared
## with `b`.
x
# Output: # This output is OK now, but if we change order of `a`, `b`,
# and `c` in the initial list the output will change.
# This is why the constraint above is necessary?
$a
[1] 1 2 3 4 5
$b
[1] 6 7
$c
[1] 8 9

#Find which element in the list has smallest max.
smallest_max <- which.min(sapply(x, max))
#Rearrange the list by keeping the smallest max in first place
#followed by remaining ones
new_x <- c(x[smallest_max], x[-smallest_max])
#Apply the Map function
new_x[-1] <- Map(setdiff, new_x[-1], new_x[-length(new_x)])
new_x
#$a
#[1] 1 2 3 4 5
#$b
#[1] 6 7
#$c
#[1] 8 9
We can wrap this up in a function and then use it
keep_smallest_max <- function(x) {
smallest_max <- which.min(sapply(x, max))
new_x <- c(x[smallest_max], x[-smallest_max])
new_x[-1] <- Map(setdiff, new_x[-1], new_x[-length(new_x)])
new_x
}
keep_smallest_max(x)
#$a
#[1] 1 2 3 4 5
#$b
#[1] 6 7
#$c
#[1] 8 9

Related

combine elements of list of lists with the same name

I have a list of 4 lists with the same name:
lst1 <-
list(list(c(1,2,3)),list(c(7,8,9)),list(c(4,5,6)),list(c(10,11,12)))
names(lst1) <- c("a","b","a","b")
I want to combine the sub lists together (first "a" with second "a", first "b" with second "b":
result <- list(list(c(1,2,3,4,5,6)),list(c(7,8,9,10,11,12)))
names(result) <- c("a","b")
I have tried multiple things, but can't figure it out.
Since lst1["a"] isn't going to give us all the elements of lst1 named a, we are going to need to work with names(lst1). One base R approach would be
nm <- names(lst1)
result <- lapply(unique(nm), function(n) unname(unlist(lst1[nm %in% n])))
names(result) <- unique(nm)
result
# $a
# [1] 1 2 3 4 5 6
#
# $b
# [1] 7 8 9 10 11 12
Another option is to use unlist first and then split the resulting vector.
vec <- unlist(lst1)
split(unname(vec), sub("\\d+$", "", names(vec)))
#$a
#[1] 1 2 3 4 5 6
#$b
#[1] 7 8 9 10 11 12
Just group the elements with the same name and unlist them:
tapply(lst1,names(lst1),FUN=function(x) unname(unlist(x)))

R: how to find index of all repetition vector values order by unique vector without using loop?

I have a vector of integers like this:
a <- c(2,3,4,1,2,1,3,5,6,3,2)
values<-c(1,2,3,4,5,6)
I want to list, for every unique value in my vector (the unique values being ordered), the position of their occurences. My desired output:
rep_indx<-data.frame(c(4,6),c(1,5,11),c(2,7,10),c(3),c(8),c(9))
split fits pretty well here, which returns a list of indexes for each unique value in a:
indList <- split(seq_along(a), a)
indList
# $`1`
# [1] 4 6
#
# $`2`
# [1] 1 5 11
#
# $`3`
# [1] 2 7 10
#
# $`4`
# [1] 3
#
# $`5`
# [1] 8
#
# $`6`
# [1] 9
And you can access the index by passing the value as a character, i.e.:
indList[["1"]]
# [1] 4 6
You can do this, using sapply. The ordering that you need is ensured by the sort function.
sapply(sort(unique(a)), function(x) which(a %in% x))
#### [[1]]
#### [1] 4 6
####
#### [[2]]
#### [1] 1 5 11
#### ...
It will result in a list, giving the indices of your repetitions. It can't be a data.frame because a data.frame needs to have columns of same lengths.
sort(unique(a)) is exactly your vector variable.
NOTE: you can also use lapply to force the output to be a list. With sapply, you get a list except if by chance the number of replicates is always the same, then the output will be a matrix... so, your choice!
Perhaps this also works
order(match(a, values))
#[1] 4 6 1 5 11 2 7 10 3 8 9
You can use the lapply function to return a list with the indexes.
lapply(values, function (x) which(a == x))

Sorting a list of unequal-size vectors in r

Suppose I have several vectors - maybe they're stored in a list, but if there's a better data structure that's fine too:
ll <- list(c(1,3,2),
c(1,2),
c(2,1),
c(1,3,1))
And I want to sort them, using the first number, then the second number to resolve ties, then the third number to resolve remaining ties, etc.:
c(1,2)
c(1,3,1)
c(1,3,2)
c(2,1)
Are there any built in functions that will allow me to do this or do I need to roll my own solution?
(For those who know Python, what I'm after is something that mimics the behavior of sort in Python)
ll <- list(c(1,3,2),
c(1,2),
c(2,1),
c(1,3,1))
I'd prefer using NA for missing values and using rbind.data.frame instead of paste:
sortfun <- function(l) {
l1 <- lapply(l, function(x, n) {
length(x) <- n
x
}, n = max(lengths(l)))
l1 <- do.call(rbind.data.frame, l1)
l[do.call(order, l1)] #order's default is na.last = TRUE
}
sortfun(ll)
#[[1]]
#[1] 1 2
#
#[[2]]
#[1] 1 3 1
#
#[[3]]
#[1] 1 3 2
#
#[[4]]
#[1] 2 1
Here's an approach that uses data.table.
The result is a rectangular data.table with the rows ordered in the form you described. NA values are filled in where the list item was a different length.
library(data.table)
setorderv(data.table(do.call(cbind, transpose(l))), paste0("V", 1:max(lengths(l))))[]
# V1 V2 V3
# 1: 1 2 NA
# 2: 1 3 1
# 3: 1 3 2
# 4: 2 1 NA
This is ugly, but you can use the result on your list with something like:
l[setorderv(
data.table(
do.call(cbind, transpose(l)))[
, ind := seq_along(l)][],
paste0("V", seq_len(max(lengths(l)))))$ind]

How to select and remove specific elements or find their index in a vector or matrix?

Let's say I have two vectors:
x <- c(1,16,20,7,2)
y <- c(1, 7, 5,2,4,16,20,10)
I want to remove elements in y that are not in x. That is, I want to remove elements 5, 4, 10 from y.
y
[1] 1 7 2 16 20
In the end, I want vectors x and y to have to same elements. Order does not matter.
My thoughts: The match function lists the indices of the where the two vectors contains a matching element but I need a function is that essentially the opposite. I need a function that displays the indices where the elements in the two vectors don't match.
# this lists the indices in y that match the elements in x
match(x,y)
[1] 1 6 7 2 4 # these are the indices that I want; I want to remove
# the other indices from y
Does anyone know how to do this? thank you
You are after intersect
intersect(x,y)
## [1] 1 16 20 7 2
If you want the indices for the elements of y in x, using which and %in% (%in% uses match internally, so you were on the right track here)
which(y %in% x)
## [1] 1 2 4 6 7
As #joran points out in the comments intersect will drop the duplicates, so perhaps a safe option, if you want to return true matches would be something like
intersection <- function(x,y){.which <- intersect(x,y)
.in <- x[which(x %in% y)]
.in}
x <- c(1,1,2,3,4)
y <- c(1,2,3,3)
intersection(x,y)
## [1] 1 1 2 3
# compare with
intersect(x,y)
## [1] 1 2 3
intersection(y,x)
## [1] 1 2 3 3
# compare with
intersect(y, x)
## [1] 1 2 3
You then need to be careful about ordering with this modified function (which is avoided with intersect as it drops duplicated elements )
If you want the index of those element of y not in x, simply prefix with ! as `%in% returns a logical vector
which(!y%in%x)
##[1] 3 5 8
Or if you want the elements use setdiff
setdiff(y,x)
## [1] 5 4 10

Assignment to the result of a function changes variable

Looking through the ave function, I found a remarkable line:
split(x, g) <- lapply(split(x, g), FUN) # From ave
Interestingly, this line changes the value of x, which I found unexpected. I expected that split(x,g) would result in a list, which could be assigned to, but discarded afterward. My question is, why does the value of x change?
Another example may explain better:
a <- data.frame(id=c(1,1,2,2), value=c(4,5,7,6))
# id value
# 1 1 4
# 2 1 5
# 3 2 7
# 4 2 6
split(a,a$id) # Split a row-wise by id into a list of size 2
# $`1`
# id value
# 1 1 4
# 2 1 5
# $`2`
# id value
# 3 2 7
# 4 2 6
# Find the row with highest value for each id
lapply(split(a,a$id),function(x) x[which.max(x$value),])
# $`1`
# id value
# 2 1 5
# $`2`
# id value
# 3 2 7
# Assigning to the split changes the data.frame a!
split(a,a$id)<-lapply(split(a,a$id),function(x) x[which.max(x$value),])
a
# id value
# 1 1 5
# 2 1 5
# 3 2 7
# 4 2 7
Not only has a changed, but it changed to a value that does not look like the right hand side of the assignment! Even if assigning to split(a,a$id) somehow changes a (which I don't understand), why does it result in a data.frame instead of a list?
Note that I understand that there are better ways to accomplish this task. My question is why does split(a,a$id)<-lapply(split(a,a$id),function(x) x[which.max(x$value),]) change a?
The help page for split says in its header: "The replacement forms replace values corresponding to such a division." So it really should not be unexpected, although I admit it is not widely used. I do not understand how your example illustrates that the assigned values "do not look like the RHS of the assignment!". The max values are assigned to the 'value' lists within categories defined by the second argument factor.
(I do thank you for the question. I had not realized that split<- was at the core of ave. I guess it is more widely used than I realized, since I think ave is a wonderfully useful function.)
Just after definition of a, perform split(a, a$id)=1, the result would be:
> a
id value
1 1 1
2 1 1
3 1 1
4 1 1
The key here is that split<- actually modified the LHS with RHS values.
Here's an example:
> x <- c(1,2,3);
> split(x,x==2)
$`FALSE`
[1] 1 3
$`TRUE`
[1] 2
> split(x,x==2) <- split(c(10,20,30),c(10,20,30)==20)
> x
[1] 10 20 30
Note the line where I re-assign split(x,x==2) <- . This actually reassigns x.
As the comments below have stated, you can look up the definition of split<- like so
> `split<-.default`
function (x, f, drop = FALSE, ..., value)
{
ix <- split(seq_along(x), f, drop = drop, ...)
n <- length(value)
j <- 0
for (i in ix) {
j <- j%%n + 1
x[i] <- value[[j]]
}
x
}
<bytecode: 0x1e18ef8>
<environment: namespace:base>

Resources