Return index numbers in R where object lengths are not multiples [duplicate] - r

This question already has answers here:
Is there an R function for finding the index of an element in a vector?
(4 answers)
Closed 2 years ago.
I have a vector like so:
foo = c("A", "B", "C", "D")
And I want a vector of selected index numbers, which I imagined I could do like so:
which(foo == c("A", "B", "D"))
But apparently this only works if the lengths of the two vectors are multiples, as otherwise you get an incomplete result followed by a warning message:
"longer object length is not a multiple of shorter object length".
So how do I get what I'm after, which is "1 2 4"?

Use match:
match(c('A', 'B', 'C'), foo)

Using %in% is one option here:
foo <- c("A", "B", "C", "D")
x <- c("A", "B", "D")
c(1:4)[foo %in% x] # [1] 1 2 4
The quantity foo %in% x returns a logical vector which can then be used to subset the indices you want to see.

Related

R - Data.table fast binary search based subset with multiple values in second key

I have come across this vignette at https://cran.r-project.org/web/packages/data.table/vignettes/datatable-keys-fast-subset.html#multiple-key-point.
My data looks like this:
ID TYPE MEASURE_1 MEASURE_2
1 A 3 3
1 B 4 4
1 C 5 5
1 Mean 4 4
2 A 10 1
2 B 20 2
2 C 30 3
2 Mean 20 2
When I do this ... all works as expected.
setkey(dt, ID, TYPE)
dt[.(unique(ID), "A")] # extract SD of all IDs with Type A
dt[.(unique(ID), "B")] # extract SD of all IDs with Type B
dt[.(unique(ID), "C")] # extract SD of all IDs with Type C
Whenever I try sth like this, where I want to base the keyed subset on multiple values for the second key, I only get the result of the all combinations of unique values in key 1 with only the first value defined in the vector c() for the second key. So, it only takes the first value defined in the vector and ignores all following values.
# extract SD of all IDs with one of the 3 types A/B/C
dt[.(unique(ID), c("A", "B", "C")]
# previous output is equivalent to
dt[.(unique(ID), "A")] # extract SD of all IDs with Type A
# I want/expect
dt[TYPE %in% c("A", "B", "C")]
What am I missing here or is this sth I cannot do with keyed subsets?
To clarify: As I cannot leave out the key 1 in keyed subsets, the vignette calls for inclusion of the first key with unique(key1)
And defining multiple keys in key 1 works also as expected.
dt[.(c(1, 2), "A")] == dt[ID %in% c(1,2) & TYPE == "A"] # TRUE
In the data.table documention (see help("data.table") or https://rdatatable.gitlab.io/data.table/reference/data.table.html#arguments), it is mentioned :
character, list and data.frame input to i is converted into a data.table internally using as.data.table.
So, the classical recycling rule used in R (or in data.frame) applies. That is, .(unique(ID), c("A", "B", "C")), which is equivalent to list(unique(ID), c("A", "B", "C")), becomes:
as.data.table(list(unique(ID), c("A", "B", "C")))
and since the length of the longest list element (length of c("A", "B", "C")) is not a multiple of the shorter one (length of unique(ID)), you will get an error.
If you want each value in unique(ID) combined with each element in c("A", "B", "C"), you should use CJ(unique(ID), c("A", "B", "C")) instead.
So what you should do is dt[CJ(unique(ID), c("A", "B", "C"))].
Note that dt[.(unique(ID), "A")] works correctly because you passed only one element for the second key and this gets recycled to match the length of unique(ID).

How to return the index of certain duplicate strings in a character vector ignoring the index of the first occurence of the duplicate string?

I have a vector of strings and I want to return the index of the duplicate values, except for the index of the first occurrence of a duplicate value, given another vector with matches. For example:
x <- c("a", "b", "c", "b", "a", "a", "c", "c")
matching_values <- c("a", "b")
So I would like to have an integer vector returned with the indexes 4, 5, 6. So the first duplicate of a occurs at position 5 and the second duplicate at position 6. The first duplicate for b occurs at index 4 and because I did not specify to match c, there will be no index returned. Thank you!
You could use :
which(duplicated(x) & x %in% matching_values)
#[1] 4 5 6
We can use duplicated with %in%
which(x %in% matching_values & duplicated(x))
#[1] 4 5 6

Intersection of vectors within a same list [duplicate]

This question already has answers here:
How to find common elements from multiple vectors?
(3 answers)
Closed 5 years ago.
I have a list of vectors with character strings:
lst <- list(v1 = c("A", "B", "C", "D"), v2 = c("B", "C", "D", "E"), v3 = c("C", "D", "E", "F")
How can I get a new vector that includes only those character strings that intersect each vector. E.g. in this case
out <- c("C", "D")
A solution with lapply would be great.
The easiest function to use would be Reduce. Try
Reduce(intersect, lst)
That's basically the same as
# data
x <- list(A, B, C)
# these are equivalent
Reduce(intersect, x)
intersect(intersect(A, B), C)

Joining lists into a vector [duplicate]

This question already has answers here:
Create sequence of repeated values, in sequence?
(3 answers)
Closed 6 years ago.
I want to create the following vector using a, b, c repeating each letter thrice:
BB<-c("a","a","a","b","b","b","c","c","c")
This is my code:
Alphabet<-c("a","b","c")
AA<-list()
for(i in 1:3){
AA[[i]]<-rep(Alphabet[i],each=3)
}
BB<-do.call(rbind,AA)
But I am getting a dataframe:
dput(BB)
structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L,
3L))
What I am doing wrong?
As Akrun mentioned we can use the same rep function
create a vector which consists of letters a,b,c
A <- c("A","B","C")
Apply rep function for the same vector, use each as sub function
AA <- rep(A,each=3)
print(AA)
[1] "A" "A" "A" "B" "B" "B" "C" "C" "C"
You should use c function to concatenate, not the rbind. This will give you vector.
Alphabet<-c("a","b","c")
AA<-list()
for(i in 1:3){
AA[[i]]<-rep(Alphabet[i],each=3)
}
BB<-do.call(c,AA)
Akrun comment is also true, if thats what you want.
You can also concatenate the rep function like so:
BB <- c(rep("a", 3), rep("b", 3), rep("c", 3))
Here is a solution but note this form or appending is not efficient for large input arrays
Alphabet <- c("a","b","c")
bb <- c()
for (i in 1:length(Alphabet)) {
bb <- c(bb, rep(Alphabet[i], 3))
}

Subset a data frame using OR when the column contains a factor

I would like to make a subset of a data frame in R that is based on one OR another value in a column of factors but it seems I cannot use | with factor values.
Example:
# fake data
x <- sample(1:100, 9)
nm <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
fake <- cbind(as.data.frame(nm), as.data.frame(x))
# subset fake to only rows with name equal to a or b
fake.trunk <- fake[fake$nm == "a" | "b", ]
produces the error:
Error in fake$nm == "a" | "b" :
operations are possible only for numeric, logical or complex types
How can I accomplish this?
Obviously my actual data frame has more than 3 values in the factor column so just using != "c" won't work.
You need fake.trunk <- fake[fake$nm == "a" | fake$nm == "b", ]. A more concise way of writing that (especially with more than two conditions) is:
fake[ fake$nm %in% c("a","b"), ]
Another approach would be to use subset() and write
fake.trunk = subset(fake, nm %in% c('a', 'b'))

Resources