$value in unidimensional integrals in R [duplicate] - r

I have transitioned from STATA to R, and I was experimenting with different data types so that R's data structures are clear in my mind.
Here's how I set up my data structure:
b<-list(u=5,v=12)
c<-list(u=7)
j<-list(name="Joe",salary=55000,union=T)
bcj<-list(b,c,j)
Now, I was trying to figure out different ways to access u=5. I believe there are three ways:
Try1:
bcj[[1]][[1]]
I got 5. Correct!
Try2:
bcj[[1]][["u"]]
I got 5. Correct!
Try3:
bcj[[1]]$u
I got 5. Correct!
Try4
bcj[[1]][1][1]
Here's what I got:
bcj[[1]][1][1]
$u
[1] 5
class(bcj[[1]][1][1])
[1] "list"
Question 1: Why did this happen?
Also, I experimented with the following:
bcj[[1]][1][1][1][1][1]
$u
[1] 5
class(bcj[[1]][1][1][1][1][1])
[1] "list"
Question 2: I would have expected an error because I don't think so many lists exist in bcj, but R gave me a list. Why did this happen?
PS: I did look at this thread on SO, but it's talking about a different issue.

I think this is sufficient to answer your question. Consider a length-1 list:
x <- list(u = 5)
#$u
#[1] 5
length(x)
#[1] 1
x[1]
x[1][1]
x[1][1][1]
...
always gives you the same:
#$u
#[1] 5
In other words, x[1] will be identical to x, and you fall into infinite recursion. No matter how many [1] you write, you just get x itself.
If I create t1<-list(u=5,v=7), and then do t1[2][1][1][1]...this works as well. However, t1[[2]][2] gives NA
That is the difference between [[ and [ when indexing a list. Using [ will always end up with a list, while [[ will take out the content. Compare:
z1 <- t1[2]
## this is a length-1 list
#$v
#[1] 7
class(z1)
# "list"
z2 <- t1[[2]]
## this takes out the content; in this case, a vector
#[1] 7
class(z2)
#[1] "numeric"
When you do z1[1][1]..., as discussed above, you always end up with z1 itself. While if you do z2[2], you surely get an NA, because z2 has only one element, and you are asking for the 2nd element.
Perhaps this post and my answer there is useful for you: Extract nested list elements using bracketed numbers and names?

Related

Is there no "multiple match vector" function in R?

I was trying to find a "readily available" function to do the following:
> my_array = c(5,9,11,10,6,5,9,13)
> my_array
[1] 5 9 11 10 6 5 9 13
> my_test <- c(5, 6)
> new_match_function(my_test, my_array)
[1] 1 5 6
# or instead, maybe:
# [[1]]
# [1] 1 6
# [[2]]
# [1] 5
For my purposes, %in% is close enough, since it will return:
> my_array %in% my_test
[1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
and I could just do:
> seq(length(my_array))[my_array %in% my_test]
[1] 1 5 6
But it just seems that something like match should provide this capability: a means to return multiple elements from the match.
If I were to create a package simply to provide this solution, it would not be strongly adopted (for good reason... this tiny use case is not worth installing a package).
Is there a solution already available? If not, where is a good place for me to add this? As I showed, it's easy enough to solve without a new function, but for match to not allow for multiple matches seems crazy. I'd ideally like to either:
Find out that I'm wrong and there is a direct function to accomplish this, or
Be able to alter match itself so that it can return multiple occurrences.
But my impression (right or wrong) has been that any adjustments to the base code are more trouble than they are worth.
For simple cases, which(my_array %in% my_test) or lapply(my_test, function(x) which(my_array==x)) works fine, but those are not the most efficient.
For the first case (just knowing which are matches, not seeing to which elements they correspond), using the fastmatch-package may help, it has the %fin% (fast-in) function, that keeps a hash table of your array so that subsequent lookups are more efficient.
For the second case, there is findMatches in the S4Vectors-bioconductor-package. (https://bioconductor.org/packages/release/bioc/html/S4Vectors.html)
Note that this function doesn't return a list, but a hits-object. To get a list, you need the buioconductor IRanges-package as well (and use as.list). (https://bioconductor.org/packages/release/bioc/html/IRanges.html)

Storing a value in a nested list with an unknown depth in R

I am trying to optimize a code which is very computational-intensive, because it deals with subsets of a 80-elements set.
A crucial step that I want to accelerate is finding if the current subset in my loop has already been treated or not. For the moment, I check if this subset is contained in the already treated subset of the same size k (cardinal). It would be much more faster to store progressively treated subset in a nested list to check if a subset has already been treated or not (O(1) instead of a search in O(80 choose k)).
I had no problem coding a function to check if the current subset is in my nested list of treated subset: access(treated, subset=c(2,5,3)) returns TRUE iff treated[[2]][[5]][[3]]==TRUE
However, I have no idea how to store (inside my loop) my current subset in the list of treated. I would like something like this to be possible: treated[h] <- TRUE where h is my current subset (in the above example: h=c(2,5,3))
The main problem that I am facing is that the number of "[[..]]" varies inside my loop. Do I have any other option rather than completing h so that it has a length of 80 and putting a sequence of 80 "[[..]]", like: treated[[h[1]]][[h[2]]]...[[h[80]]] <- TRUE ?
If h is a vector of values then
"[["(treated, h)
recursively subsets the list items.
For example, I created a (not so highly) nested list:
> a
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[[1]][[2]][[1]]
[1] 3
[[2]]
[1] 1
The following command, correctly recursively applies item subsetting to the list:
> "[["(a, c(1,2,1))
[1] 3
The length of the recursively subsetting vector can vary without fixing the number of [[..]]'s. For example, subsetting two levels of depth with the same syntax:
> "[["(a, c(1,2))
[[1]]
[1] 3

Types and comparisons in R

I've been working with R for a month or so, and my comprehension of some subtleties is still quite superficial.
I have had an issue, which I managed to solve (details below), but I still can't explain precisely why it did not work with the first solution.
Note that the example below makes no practical sense for I have simplified it as much as possible so that the problem is quite clear.
ISSUE :
Given a data frame with 4 columns (email, first, last, company) :
> users <- data.frame(matrix(vector(), 0, 4, dimnames=list(c(), c("email", "first", "last", "company"))), stringsAsFactors=F)
> users[1,] <- c("robert#redford.com", "Robert", "Redford", "Paramount")
> users[2,] <- c("julia#roberts.com", "Erin", "B.", "Hinkley")
> users[3,] <- c("matt#damon.com", "Will", "H.", "Stanford")
> users[4,] <- c("john#malkovitch.com", "John", "M.", "JM")
I take one particular row :
> user <- users[3,]
When I try to subset the dataframe on a criteria which could have lead to return the previously mentioned row, it returns no result.
> users[users$email == user["email"],]
[1] email first last company
<0 lignes> (ou 'row.names' de longueur nulle)
I instantly thought it was a casting issue (sorry for this bad one)
> users[users$email == as.character(user["email"]),]
email first last company
3 matt#damon.com Will H. Stanford
However, when I tried to figure out where exactly the issue was, and tried this :
> users[users$email == "matt#damon.com",]
email first last company
3 matt#damon.com Will H. Stanford
> user["email"] == "matt#damon.com"
email
3 TRUE
> users[3,]$email == user$email
[1] TRUE
I got quite confused :
First, I thought about it as a math problem : if A == B and B == C, then A == C (according to Captain Obvious). So, just replacing a member A by another member B which is supposed to be equal to A (given the "TRUE" statement) in some expression should have no impact on the result of this expression.
3 TRUE != [1] TRUE. I think [1] TRUE is a logical vector of size 1 which first element is TRUE. 3 TRUE is (1x1) matrix row, which column "email" value is TRUE.
My problem is with consistency : either two objects of equal content but different types should be equal, or they should be different. I have a problem with "Sometimes there is type inference, and sometimes not". Is there a rule I can't see beyond this behavior ? (I guess there is one)
Another expression of the behavior I'd like to get is this one :
> unique(users$email) == "matt#damon.com"
[1] FALSE FALSE TRUE FALSE
> unique(users$email) == user["email"]
email
3 FALSE
Obviously R does get what I want (considering the fact that it gives me the matching row). But I can't explain (nor use) the result of the second statement.
Any explanations / thoughts?
in normal list situations
users$email == user[["email"]]
however in data.frames things get inconsistent/ a lot worse!
tdf=data.frame(matrix(1:100,10,10))
tdf[] # returns data.frame everything
tdf[1] # returns data.frame first column
tdf[1,1] # returns object as type of the object...
tdf[,1] # returns a vector of the first column
tdf[1,] # returns a data.frame of the first row # eeeeeugh... that is odd....
tdf[2:4] # returns a data.frame with 3 columns
tdf[1,2:4] # returns a data.frame of the first row of 3 colums
tdf[2:4,2:4] # returns a 3x3 data.frame
tdf[2:4,1] # returns a vector of 2:4 row and 1st column
tdf[,2:4] # returns a data.frame with 3 columns
then there is also the double [[]]
do note that in data.frames things get horribly annoying and fugly
tdf[[1]] # gives the first row as a vector
tdf[[1,1]] # gives first element
and pretty much all other combinations gives errors
and assigning stuff to a data.frame or matrix, is an even bigger mess!

Subsetting in R

x <- list(l1=list(1:4),l2=list(2:5),l3=list(3:8))
I know [] is used for extracting multiple elements and [[]] is used to extract a single element in a list inside a list. I need help in extracting multiple elements in a list inside another list. For example I need to extract 1,3 from list l1 which is inside another list?
For full details, see help(Extract) which covers [[ and [
The [[ operator can walk/search nested lists in a single step, by providing a vector of names OR indices (a path):
> y = list(a=list(b=1))
> y[[c("a","b")]]
[1] 1
> y[[c(1,1)]]
[1] 1
You can't mix names and indices:
> y[[c("a",1)]]
NULL
It seems like you are asking a different question, since your inner lists are not named.
Here's a solution using only numeric indices:
> x[[c(1,1)]]
[1] 1 2 3 4
> x[[c(1,1)]][c(1,3)]
[1] 1 3
the first 1 gets the first element of the first list. The second 1 unwraps it to expose the vector inside.
This might be useful if your real use case involves more complex paths, but to avoid surprising other programmers, in the given example the following...
x[["l1"]][[1]][c(1,3)]
...is probably preferable. The second 1 unwraps the list.
In your case, the following is also equivalent
unlist(x[["l1"]])[c(1,3)]
It sounds like you might be interested in exploring the rapply function (recursive lapply).
If I understand your question correctly, you could do something like this:
rapply(x[["l1"]], f=`[`, ...=c(1, 3))
# [1] 1 3
which is a little different than:
lapply(x[["l1"]], `[`, c(1, 3))
# [[1]]
# [1] 1 3

Recursive %in% function in R?

I am sure this is a simple question that has been asked many times, but this is one of those times when I find it difficult to know which terms to search for in order to find the solution. I have a simple list of lists, such as the one below:
sets <- list(S1=NA, S2=1L, S3=2:5)
> sets
$S1
[1] NA
$S2
[1] 1
$S3
[1] 2 3 4 5
And I have a scalar variable val which can take the value of any integer in sets (but will never be NA). Suppose val <- 4 -- then, what is a quick way to return a vector of TRUE/FALSE corresponding to each list in set where TRUE means val is in that list and FALSE means it is not? In this case I would want something like
[1] FALSE FALSE TRUE
I was hoping there would be some recursive form of %in% but I haven't had luck searching for it. Thank you!
Like this:
sapply(sets, `%in%`, x = val)
# S1 S2 S3
# FALSE FALSE TRUE
I had to look at the help page ?"%in%" to find out that the first argument to %in% is named x. And for your curiosity (not needed here), the second one is named table.

Resources