Issue using a list of Booleans to index in R - r

I've had experience programming for a few years, but am relatively new to R. I ran into an unexpected result when trying to extract an entry from an array using an array of Boolean entries:
array = c(2, 3, 4, 5);
array[c(FALSE, FALSE, FALSE, TRUE)]
#output: [1] 5
array[c(0, 0, 0, 1)]
#output: [1] 2
This surprised me, since I thought FALSE and 0 were interchangeable (likewise for TRUE and 1) in this sort of process. I checked the following to make sure, and became even more confused:
T==1
#output: [1] TRUE
F==0
#output: [1] TRUE
c(0,0,0,1)==c(F,F,F,T)
#output: [1] TRUE TRUE TRUE TRUE
Can someone help explain why R is treating these indexing methods differently?
Much thanks,

as.logical(c(0,0,0,1))==c(F,F,F,T)
Explanation
In R, numeric values are treated differently than the logical values.
In your scenario, Since 2 is the first element of the array, it is returned for the second subset operation (and there is no 0th element).
PS
Only for a few operations(e.g ==), logical values are first coerced to numeric. Thanks to #IceCreamTouchan to add this in the comments above.

Related

Removing certain elements from a vector in R [duplicate]

This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Closed 5 years ago.
I have a vector called data which has approximately 35000 elements (numeric). And I have a numeric vector A. I want to remove every appearance of elements of A in the vector data. For example if A is the vector [1,2]. I want to remove all appearance of 1 and 2 in the vector data. How can I do that? Is there a built in function that does this? I couldn't think of a way. Doing it with a loop would take a long time I assume. Thanks!
There is this handy %in%-operator. Look it up, one of the best things I can think of in any programming language! You can use it to check all elements of one vector A versus all elements of another vector B and returns a logical vector that gives the positions of all elements in A that can be found in B. It is what you need! If you are new to R, it might seem a bit weird, but you will get very much used to it.
Ok, so how to use it? Lets say datvec is your numeric vector:
datvec = c(1, 4, 1, 7, 5, 2, 8, 2, 10, -1, 0, 2)
elements_2_remove = c(1, 2)
datvec %in% elements_2_remove
## [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
So, you see a vector that gives you the positions of either 1 or 2 in datvec. So, you can use it to index what yuo want to retain (by negating it):
datvec = datvec[!(datvec %in% elements_2_remove)]
And you are done!

$value in unidimensional integrals in R [duplicate]

I have transitioned from STATA to R, and I was experimenting with different data types so that R's data structures are clear in my mind.
Here's how I set up my data structure:
b<-list(u=5,v=12)
c<-list(u=7)
j<-list(name="Joe",salary=55000,union=T)
bcj<-list(b,c,j)
Now, I was trying to figure out different ways to access u=5. I believe there are three ways:
Try1:
bcj[[1]][[1]]
I got 5. Correct!
Try2:
bcj[[1]][["u"]]
I got 5. Correct!
Try3:
bcj[[1]]$u
I got 5. Correct!
Try4
bcj[[1]][1][1]
Here's what I got:
bcj[[1]][1][1]
$u
[1] 5
class(bcj[[1]][1][1])
[1] "list"
Question 1: Why did this happen?
Also, I experimented with the following:
bcj[[1]][1][1][1][1][1]
$u
[1] 5
class(bcj[[1]][1][1][1][1][1])
[1] "list"
Question 2: I would have expected an error because I don't think so many lists exist in bcj, but R gave me a list. Why did this happen?
PS: I did look at this thread on SO, but it's talking about a different issue.
I think this is sufficient to answer your question. Consider a length-1 list:
x <- list(u = 5)
#$u
#[1] 5
length(x)
#[1] 1
x[1]
x[1][1]
x[1][1][1]
...
always gives you the same:
#$u
#[1] 5
In other words, x[1] will be identical to x, and you fall into infinite recursion. No matter how many [1] you write, you just get x itself.
If I create t1<-list(u=5,v=7), and then do t1[2][1][1][1]...this works as well. However, t1[[2]][2] gives NA
That is the difference between [[ and [ when indexing a list. Using [ will always end up with a list, while [[ will take out the content. Compare:
z1 <- t1[2]
## this is a length-1 list
#$v
#[1] 7
class(z1)
# "list"
z2 <- t1[[2]]
## this takes out the content; in this case, a vector
#[1] 7
class(z2)
#[1] "numeric"
When you do z1[1][1]..., as discussed above, you always end up with z1 itself. While if you do z2[2], you surely get an NA, because z2 has only one element, and you are asking for the 2nd element.
Perhaps this post and my answer there is useful for you: Extract nested list elements using bracketed numbers and names?

Types and comparisons in R

I've been working with R for a month or so, and my comprehension of some subtleties is still quite superficial.
I have had an issue, which I managed to solve (details below), but I still can't explain precisely why it did not work with the first solution.
Note that the example below makes no practical sense for I have simplified it as much as possible so that the problem is quite clear.
ISSUE :
Given a data frame with 4 columns (email, first, last, company) :
> users <- data.frame(matrix(vector(), 0, 4, dimnames=list(c(), c("email", "first", "last", "company"))), stringsAsFactors=F)
> users[1,] <- c("robert#redford.com", "Robert", "Redford", "Paramount")
> users[2,] <- c("julia#roberts.com", "Erin", "B.", "Hinkley")
> users[3,] <- c("matt#damon.com", "Will", "H.", "Stanford")
> users[4,] <- c("john#malkovitch.com", "John", "M.", "JM")
I take one particular row :
> user <- users[3,]
When I try to subset the dataframe on a criteria which could have lead to return the previously mentioned row, it returns no result.
> users[users$email == user["email"],]
[1] email first last company
<0 lignes> (ou 'row.names' de longueur nulle)
I instantly thought it was a casting issue (sorry for this bad one)
> users[users$email == as.character(user["email"]),]
email first last company
3 matt#damon.com Will H. Stanford
However, when I tried to figure out where exactly the issue was, and tried this :
> users[users$email == "matt#damon.com",]
email first last company
3 matt#damon.com Will H. Stanford
> user["email"] == "matt#damon.com"
email
3 TRUE
> users[3,]$email == user$email
[1] TRUE
I got quite confused :
First, I thought about it as a math problem : if A == B and B == C, then A == C (according to Captain Obvious). So, just replacing a member A by another member B which is supposed to be equal to A (given the "TRUE" statement) in some expression should have no impact on the result of this expression.
3 TRUE != [1] TRUE. I think [1] TRUE is a logical vector of size 1 which first element is TRUE. 3 TRUE is (1x1) matrix row, which column "email" value is TRUE.
My problem is with consistency : either two objects of equal content but different types should be equal, or they should be different. I have a problem with "Sometimes there is type inference, and sometimes not". Is there a rule I can't see beyond this behavior ? (I guess there is one)
Another expression of the behavior I'd like to get is this one :
> unique(users$email) == "matt#damon.com"
[1] FALSE FALSE TRUE FALSE
> unique(users$email) == user["email"]
email
3 FALSE
Obviously R does get what I want (considering the fact that it gives me the matching row). But I can't explain (nor use) the result of the second statement.
Any explanations / thoughts?
in normal list situations
users$email == user[["email"]]
however in data.frames things get inconsistent/ a lot worse!
tdf=data.frame(matrix(1:100,10,10))
tdf[] # returns data.frame everything
tdf[1] # returns data.frame first column
tdf[1,1] # returns object as type of the object...
tdf[,1] # returns a vector of the first column
tdf[1,] # returns a data.frame of the first row # eeeeeugh... that is odd....
tdf[2:4] # returns a data.frame with 3 columns
tdf[1,2:4] # returns a data.frame of the first row of 3 colums
tdf[2:4,2:4] # returns a 3x3 data.frame
tdf[2:4,1] # returns a vector of 2:4 row and 1st column
tdf[,2:4] # returns a data.frame with 3 columns
then there is also the double [[]]
do note that in data.frames things get horribly annoying and fugly
tdf[[1]] # gives the first row as a vector
tdf[[1,1]] # gives first element
and pretty much all other combinations gives errors
and assigning stuff to a data.frame or matrix, is an even bigger mess!

Couldn't reduce the looping variable inside the "for" loop in R

I have a for loop to do a matrix manipulation in R. For some checks are true i need to come to the same row again., means i need to be reduced by 1.
for(i in 1:10)
{
if(some chk)
{
i=i-1
}
}
Actually i is not reduced for me. For an example in 5th row i'm reducing the i to 4, so again it should come as 5, but it is coming as 6.
Please advice.
My intention is:
Checking the first column values of a matrix, if I find any duplicate value, I take the second column value and append with the first row's second column and remove the duplicate row. So, when I'm removing a row I do not need increase the i in while loop. (This is just a map reduce method, append values of same key)
Variables in R for loops are read-only, you cannot modify them. What you have written would be solved completely differently in normal R code – the exact solution depending on the actual problem, there isn’t a generic, direct replacement (except by replacing the whole thing with a while loop but this is both ugly and probably unnecessary).
To illustrate this, consider these two typical examples.
Assume you want to filter all duplicated elements from a list. Instead of looping over the list and copying all duplicated elements, you can use the duplicated function which tells you, for each element, whether it’s a duplicate.
Secondly, you use standard R subsetting syntax to select just those elements which are not a duplicate:
x = x[! duplicated(x)]
(This example works on a one-dimensional vector or list, but it can be generalised to more dimensions.)
For a more complex case, let’s say that you have a vector of numbers and, for every even number in the vector, you want to double the preceding number (this is highly artificial but in signal processing you might face similar problems). In other words:
input = c(1, 3, 2, 5, 6, 7, 1, 8)
output = ???
output
# [1] 1 6 2 10 6 7 2 8
… we want to fill in ???. In the first step, we check which numbers are even:
even = input %% 2 == 0
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
Next, we shift the result down – because we want to know whether the next number is even – by removing the first element, and appending a dummy element (FALSE) at the end.
even = c(even[-1], FALSE)
# [1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
And now we can multiply just these inputs by two:
output = input
output[even] = output[even] * 2
There, done.

Recursive %in% function in R?

I am sure this is a simple question that has been asked many times, but this is one of those times when I find it difficult to know which terms to search for in order to find the solution. I have a simple list of lists, such as the one below:
sets <- list(S1=NA, S2=1L, S3=2:5)
> sets
$S1
[1] NA
$S2
[1] 1
$S3
[1] 2 3 4 5
And I have a scalar variable val which can take the value of any integer in sets (but will never be NA). Suppose val <- 4 -- then, what is a quick way to return a vector of TRUE/FALSE corresponding to each list in set where TRUE means val is in that list and FALSE means it is not? In this case I would want something like
[1] FALSE FALSE TRUE
I was hoping there would be some recursive form of %in% but I haven't had luck searching for it. Thank you!
Like this:
sapply(sets, `%in%`, x = val)
# S1 S2 S3
# FALSE FALSE TRUE
I had to look at the help page ?"%in%" to find out that the first argument to %in% is named x. And for your curiosity (not needed here), the second one is named table.

Resources