I was trying to find a "readily available" function to do the following:
> my_array = c(5,9,11,10,6,5,9,13)
> my_array
[1] 5 9 11 10 6 5 9 13
> my_test <- c(5, 6)
> new_match_function(my_test, my_array)
[1] 1 5 6
# or instead, maybe:
# [[1]]
# [1] 1 6
# [[2]]
# [1] 5
For my purposes, %in% is close enough, since it will return:
> my_array %in% my_test
[1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
and I could just do:
> seq(length(my_array))[my_array %in% my_test]
[1] 1 5 6
But it just seems that something like match should provide this capability: a means to return multiple elements from the match.
If I were to create a package simply to provide this solution, it would not be strongly adopted (for good reason... this tiny use case is not worth installing a package).
Is there a solution already available? If not, where is a good place for me to add this? As I showed, it's easy enough to solve without a new function, but for match to not allow for multiple matches seems crazy. I'd ideally like to either:
Find out that I'm wrong and there is a direct function to accomplish this, or
Be able to alter match itself so that it can return multiple occurrences.
But my impression (right or wrong) has been that any adjustments to the base code are more trouble than they are worth.
For simple cases, which(my_array %in% my_test) or lapply(my_test, function(x) which(my_array==x)) works fine, but those are not the most efficient.
For the first case (just knowing which are matches, not seeing to which elements they correspond), using the fastmatch-package may help, it has the %fin% (fast-in) function, that keeps a hash table of your array so that subsequent lookups are more efficient.
For the second case, there is findMatches in the S4Vectors-bioconductor-package. (https://bioconductor.org/packages/release/bioc/html/S4Vectors.html)
Note that this function doesn't return a list, but a hits-object. To get a list, you need the buioconductor IRanges-package as well (and use as.list). (https://bioconductor.org/packages/release/bioc/html/IRanges.html)
Related
I have transitioned from STATA to R, and I was experimenting with different data types so that R's data structures are clear in my mind.
Here's how I set up my data structure:
b<-list(u=5,v=12)
c<-list(u=7)
j<-list(name="Joe",salary=55000,union=T)
bcj<-list(b,c,j)
Now, I was trying to figure out different ways to access u=5. I believe there are three ways:
Try1:
bcj[[1]][[1]]
I got 5. Correct!
Try2:
bcj[[1]][["u"]]
I got 5. Correct!
Try3:
bcj[[1]]$u
I got 5. Correct!
Try4
bcj[[1]][1][1]
Here's what I got:
bcj[[1]][1][1]
$u
[1] 5
class(bcj[[1]][1][1])
[1] "list"
Question 1: Why did this happen?
Also, I experimented with the following:
bcj[[1]][1][1][1][1][1]
$u
[1] 5
class(bcj[[1]][1][1][1][1][1])
[1] "list"
Question 2: I would have expected an error because I don't think so many lists exist in bcj, but R gave me a list. Why did this happen?
PS: I did look at this thread on SO, but it's talking about a different issue.
I think this is sufficient to answer your question. Consider a length-1 list:
x <- list(u = 5)
#$u
#[1] 5
length(x)
#[1] 1
x[1]
x[1][1]
x[1][1][1]
...
always gives you the same:
#$u
#[1] 5
In other words, x[1] will be identical to x, and you fall into infinite recursion. No matter how many [1] you write, you just get x itself.
If I create t1<-list(u=5,v=7), and then do t1[2][1][1][1]...this works as well. However, t1[[2]][2] gives NA
That is the difference between [[ and [ when indexing a list. Using [ will always end up with a list, while [[ will take out the content. Compare:
z1 <- t1[2]
## this is a length-1 list
#$v
#[1] 7
class(z1)
# "list"
z2 <- t1[[2]]
## this takes out the content; in this case, a vector
#[1] 7
class(z2)
#[1] "numeric"
When you do z1[1][1]..., as discussed above, you always end up with z1 itself. While if you do z2[2], you surely get an NA, because z2 has only one element, and you are asking for the 2nd element.
Perhaps this post and my answer there is useful for you: Extract nested list elements using bracketed numbers and names?
x <- list(l1=list(1:4),l2=list(2:5),l3=list(3:8))
I know [] is used for extracting multiple elements and [[]] is used to extract a single element in a list inside a list. I need help in extracting multiple elements in a list inside another list. For example I need to extract 1,3 from list l1 which is inside another list?
For full details, see help(Extract) which covers [[ and [
The [[ operator can walk/search nested lists in a single step, by providing a vector of names OR indices (a path):
> y = list(a=list(b=1))
> y[[c("a","b")]]
[1] 1
> y[[c(1,1)]]
[1] 1
You can't mix names and indices:
> y[[c("a",1)]]
NULL
It seems like you are asking a different question, since your inner lists are not named.
Here's a solution using only numeric indices:
> x[[c(1,1)]]
[1] 1 2 3 4
> x[[c(1,1)]][c(1,3)]
[1] 1 3
the first 1 gets the first element of the first list. The second 1 unwraps it to expose the vector inside.
This might be useful if your real use case involves more complex paths, but to avoid surprising other programmers, in the given example the following...
x[["l1"]][[1]][c(1,3)]
...is probably preferable. The second 1 unwraps the list.
In your case, the following is also equivalent
unlist(x[["l1"]])[c(1,3)]
It sounds like you might be interested in exploring the rapply function (recursive lapply).
If I understand your question correctly, you could do something like this:
rapply(x[["l1"]], f=`[`, ...=c(1, 3))
# [1] 1 3
which is a little different than:
lapply(x[["l1"]], `[`, c(1, 3))
# [[1]]
# [1] 1 3
Could someone please just tell me the switch in R that returns the second argument if true and the third if false?
I have searched for switch and if else function and I have looked through the documentation but when using ubiquitous terms like if and else it seems very hard to identify a solution.
I am looking for something like:
f(TRUE,1,2); f(FALSE,1,2)
[1] 1
[1] 2
I am working on reading through the documentation of Julia which has made me aware of some of my gaps in knowledge in R. In Julia there is an operator available.
(true ? 1 : 2)
1
(false ? 1 : 2)
2
Try this
ifelse(condition, 1, 2)
Oddly enough, it is named ifelse() :-)
PS And while we're at it, do not use T and F, use TRUE and FALSE. Every self-respecting style-guide suggests so.
Simply ifelse
ifelse(TRUE,1,2)
## [1] 1
ifelse(FALSE,1,2)
## [1] 2
I am sure this is a simple question that has been asked many times, but this is one of those times when I find it difficult to know which terms to search for in order to find the solution. I have a simple list of lists, such as the one below:
sets <- list(S1=NA, S2=1L, S3=2:5)
> sets
$S1
[1] NA
$S2
[1] 1
$S3
[1] 2 3 4 5
And I have a scalar variable val which can take the value of any integer in sets (but will never be NA). Suppose val <- 4 -- then, what is a quick way to return a vector of TRUE/FALSE corresponding to each list in set where TRUE means val is in that list and FALSE means it is not? In this case I would want something like
[1] FALSE FALSE TRUE
I was hoping there would be some recursive form of %in% but I haven't had luck searching for it. Thank you!
Like this:
sapply(sets, `%in%`, x = val)
# S1 S2 S3
# FALSE FALSE TRUE
I had to look at the help page ?"%in%" to find out that the first argument to %in% is named x. And for your curiosity (not needed here), the second one is named table.
So I've been playing around with a data frame in R, although I'm still thinking too much in Python and cannot seem to find a solution for my problem.
I have a data frame and one of the column is an user id. I would like to remove all the first occurrence of a number, for instance:
1,2,3,4,3,4,2,1,3,4,6,7,7
I would like to have an output like this:
3,4,2,1,3,4,7
Where the first time the user_id appears I would remove it but keep all the others even if repeated.
With python I would probably use enumerate or loop over it. For R, I've seen some functions that seem cool but I'm not sure how to use it with the data frame, like rle.
Any pointers will be really helpful since right now I'm a bit lost about the best approach for this problem.
Thank you all
The function duplicated() is going to be helpful here:
x <- c(1,2,3,4,3,4,2,1,3,4,6,7,7)
> x[duplicated(x)]
[1] 3 4 2 1 3 4 7
This works because duplicated() returns a logical vector indicating whether that element is, well, duplicated:
duplicated(x)
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
You then use this logical vector to subset (extract) the values you want from x. But notice that in the extraction I keep all of the duplicated values, not remove them.
To remove all of the duplicated values (not what you want, but I illustrate regardless), try the negation:
x[!duplicated(x)]
[1] 1 2 3 4 6 7