R: How to interpret square brackets with forms like y[i : j - k] - r

Can you help me understand how R interprets square brackets with forms such as y[i:j - k]?
dummy data:
y <- c(1, 2, 3, 5, 7, 8)
Here's what I do understand:
y[i] is the ith element of vector y.
y[i:j] is the ith to jth element (inclusive) of vector y.
y[-i] is vector y without the first i elements. etc. etc.
However, what I don't understand is what happens when you start mixing these options, and I haven't found a good resource for explaining it.
For example:
y[1-1:4]
[1] 5 7 8
So y[1-1:4] returns the vector without the first three elements. But why?
and
y[1-4]
[1] 1 2 5 7 8
So y[1-4] returns the vector without the third element. Is that because 1-4 = -3 and it's interpretting it the same as y[-3]? If so, that doesn't seem consistent with my previous example where y[1-1:4] would presumably be interpretted as y[0:4], but that isn't the case.
and
y[1:1+2-1]
[1] 2
Why does this return the second element? I encountered this while I was trying to code something along the lines of: y[i:i + j - k] and it took me a while to figure out that I should write y[i:(i + j - k)] so the parenthesis captured the whole of the right-hand-side of the colon. But I still can't figure out what logic R was doing when I didn't have those brackets.
Thanks!

It's best to look closer at precedence and the integer sequences you use for subsetting. These are evaluated before subsetting with []. Note that - is a function with two arguments (1, 1:4) which are evaluated beforehand and so
> 1-1:4
[1] 0 -1 -2 -3
Negative indices in [] mean exclusion of the corresponding elements. There is no "0" element (and so subsetting at 0 returns an empty vector of the present type -- numeric(0)). We thus expect y[1-1:4] to drop the first three elements in y and return the remainder.
As you write correctly y[1-4] is y[-3], i.e. omission of the third element.
Similar as above, in 1:1+2-1, 1:1 evaluates to a one-element vector 1, the rest is simple arithmetic.
For more on operator precedence, see Hadley's excellent book.

Related

Why does R return integer(0) for under-indexing but NA for over-indexing a vector? [duplicate]

Say I have a vector, for example, x <- 1:10, then x[0] returns a zero-length vector of the same class as x, here integer(0).
I was wondering if there is a reason behind that choice, as opposed to throwing an error, or returning NA as x[11] would? Also, if you can think of a situation where having x[0] return integer(0) is useful, thank you for including it in your answer.
As seen in ?"["
NA and zero values are allowed: rows of an index matrix containing a
zero are ignored, whereas rows containing an NA produce an NA in the
result.
So an index of 0 just gets ignored. We can see this in the following
x <- 1:10
x[c(1, 3, 0, 5, 0)]
#[1] 1 3 5
So if the only index we give it is 0 then the appropriate response is to return an empty vector.
My crack at it as I am not a programmer and certainly do not contribute to R source. I think it may be because you need some sort of place holder to state that something occurred here but nothing was returned. This becomes more apparent with things like tables and split. For instance when you make a table of values and say there are zero of that cell you need to hold that that cell made from a string in a vector has no values. it would not be a appropriate to have x[0]==0 as it's not the numeric value of zero but the absence of any value.
So in the following splits we need a place holder and integer(0) holds the place of no values returned which is not the same as 0. Notice for the second one it returns numeric(0) which is still a place holder stating it was numeric place holder.
with(mtcars, split(as.integer(gear), list(cyl, am, carb)))
with(mtcars, split(gear, list(cyl, am, carb)))
So in a way my x[FALSE] retort is true in that it holds the place of the non existent zero spot in the vector.
All right this balonga I just spewed is true until someone disputes it and tears it down.
PS page 19 of this guide (LINK) state that integer() and integer(0) are empty integer.
Related SO post: How to catch integer(0)?
Since the array indices are 1-based, index 0 has no meaning. The value is ignored as a vector index.

How to find if two or more continuously elements of a vector are equal in R

I want to find a way to determine if two or more continuously elements of a vector are equal.
For example, in vector x=c(1,1,1,2,3,1,3), the first, the second and the third element are equal.
With the following command, I can determine if a vector, say y, contains two or more continuously elements that are equal to 2 or 3
all(rle(y)$lengths[which( rle(y)$values==2 | rle(y)$values==3 )]==1)
Is there any other faster way?
EDIT
Let say we have the vector z=c(1,1,2,1,2,2,3,2,3,3).
I want a vector with three elements as output. The first element will refer to value 1, the second to 2 and the third one to 3. The values of the elements of the output vector will be equal to 1 if two or more continuously elements of z are the same for one value of 1,2,3 and 0 otherwise. So, the output for the vector z will be (1,1,1).
For the vector w=c(1,1,2,3,2,3,1) the output will be 1,0,0, since only for the value 1 there are two continuously elements, that is in the first and in the second position of w.
I'm not entirely sure if I'm understanding your question as it could be worded better. The first part just asks how you find if continuous elements in a vector are equal. The answer is to use the diff() function combined with a check for a difference of zero:
z <- c(1,1,2,1,2,2,3,2,3,3)
sort(unique(z[which(diff(z) == 0)]))
# [1] 1 2 3
w <- c(1,1,2,3,2,3,1)
sort(unique(w[which(diff(w) == 0)]))
# [1] 1
But your edit example seems to imply you are looking to see if there are repeated units in a vector, of which will only be the integers 1, 2, or 3. Your output will always be X, Y, Z, where
X is 1 if there is at least one "1" repeated, else 0
Y is 2 if there is at least one "2" repeated, else 0
Z is 3 if there is at least one "3" repeated, else 0
Is this correct?
If so, see the following
continuously <- function(x){
s <- sort(unique(x[which(diff(x) == 0)]))
output <- c(0,0,0)
output[s] <- s
return(output)
}
continuously(z)
# [1] 1 2 3
continuously(w)
# [1] 1 0 0
Assuming your series name is z=c(1,1,2,1,2,2,3,2,3,3) then you can do:
(unique(z[c(FALSE, diff(z) == 0)]) >= 0)+0 which will output to 1, 1, 1,
When you run the above command on your other sequenc:
w=c(1,1,2,3,2,3,1)
then (unique(w[c(FALSE, diff(w) == 0)]) >= 0)+0 will return to 1
You may also try this for an exact output like 1,1,1 or 1,0,0
(unique(z[c(FALSE, diff(z) == 0)]) == unique(z))+0 #1,1,1 for z and 1,0,0 for w
Logic:
diff command will take difference between corresponding second and prior items, since total differences will always 1 less than the number of items, I have added first item as FALSE. Then subsetted with your original sequences and for boolean comparison whether the difference returned is zero or not. Finally we convert them to 1s by asking if they are greater than or equal to 0 (To get series of 1s, you may also check it with some other conditions to get 1s).
Assuming your sequence doesn't have negative numbers.

Series vector for approximating pi

I've been set a question about Madhava's approximation of pi. The first part of it is to create a vector which contains the first 20 terms in the series. I know I could just input the first 20 terms into a vector, however that seems like a really long winded way of doing things. I was wondering if there is an easier way to create the vector?
Currently I have the vector
g = c((-3)^(-0)/(2*0+1), (-3)^(-1)/(2*1+1), (-3)^(-2)/(2*2+1), (-3)^(-3)/(2*3+1), (-3)^(-4)/(2*4+1), (-3)^(-5)/(2*5+1), (-3)^(-6)/(2*6+1), (-3)^(-7)/(2*7+1), (-3)^(-8)/(2*8+1), (-3)^(-9)/(2*9+1), (-3)^(-10)/(2*10+1), (-3)^(-11)/(2*11+1), (-3)^(-12)/(2*12+1), (-3)^(-13)/(2*13+1), (-3)^(-14)/(2*14+1), (-3)^(-15)/(2*15+1), (-3)^(-16)/(2*16+1), (-3)^(-17)/(2*17+1), (-3)^(-18)/(2*18+1), (-3)^(-19)/(2*19+1), (-3)^(-20)/(2*20+1))
And
h = sqrt(12)
So I have done g*h to get the approximation of pi. Surely there's an easier way of doing this?
Apologies if this is relatively basic, I am very new to R and still learning how to properly use stack overflow.
Thanks.
One of the best features of R is that it is vectorised. This means that we can do operations element-wise on entire vectors rather than having to type out the operation for each element. For example, if you wanted to find the square of the first five natural numbers (starting at one), we can do this:
(1:5)^2
which results in the output
[1] 1 4 9 16 25
instead of having to do this:
c(1^2, 2^2, 3^2, 4^2, 5^2)
which gives the same output.
Applying this amazing property of R to your situation, instead of having to manually construct the whole vector, we can just do this:
series <- sqrt(12) * c(1, -1) / 3^(0:19) / seq(from=1, by=2, length.out=20)
sum(series)
which gives the following output:
[1] 3.141593
and we can see more decimal places by doing this:
sprintf("%0.20f", sum(series))
[1] "3.14159265357140338182"
To explain a little further what I did in that line of code to generate the series:
We want to multiply the entire thing by the square root of 12, hence the sqrt(12), which will be applied to every element of the resulting vector
We need the signs of the series to alternate, which is accomplished via * c(1, -1); this is because of recycling, where R recycles elements of vectors when doing vector operations. It will multiply the first element by one, the second element by -1, then recycle and multiply the third element by 1, the fourth by -1, etc.
We need to divide each element by 1, 3, 9, etc., which is accomplished by / 3^(0:19) which gives / c(3^0, 3^1, ...)
Lastly, we also need to divide by 1, 3, 5, 7, etc. which is accomplished by seq(from=1, by=2, length.out=20) (see help(seq))

array index difference notation Python <-> R

what is the Python notation a[i-j] translated to R? As far as I understand it, it should be the array element at position i-j. But in R it seems to be the array until the ith element subtracted by the element at position j.
R and Python have somewhat similar indexing properties, with the main difference being that indexing in Python starts at 0 while in R it starts at 1. Beyond the index start, there is also the fact that Python supports negative indexing, while in R negative indexing means that you are removing the element at that exact index from your list. To be specific to your case, the indexing list[i-j] could be somewhat the same thing if i - j returns a positive integer. Otherwise, you are talking about two completely different things. The illustration below should be helpful to you:
Python:
#Create a list
lst = [1,3,5,6,7,7]
#index element at 4-2 (which is 2)
lst[4-2] # returns 5
#index element at 2-4 (which is -2) or lst[len(lst)-2]
lst[2-4] # returns 7
R:
lst <- c(1,3,5,6,7,7)
#indexing element at 4-2 (which is 2)
lst[4-2] # returns 3 (because R indexing starts at 1, not 0)
[1] 3
#BUT indexing element at 2-4 (which is -2) does not work,
#because it means that you are removing the element at index 2, i.e. 3
lst[2-4] #returns the original list without element at index 2
[1] 1 5 6 7 7
These are the main differences in indexing a list that I could offer to help with your question. The differences in indexing become more prominent as you tackle more complicated data structures in both languages.
I hope this is helpful.

R in simple terms - why do I have to feel like such an idiot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
my question is simple... every reference I find in books and on the internet for learning R programming is presented in a very linear way with no context. When I try and learn things like functions, I see the code and my brain just freezes because it's looking for something to relate these R terms to and I have no frame of reference. I have a PhD and did a lot of statistics for my dissertation but that was years ago when we were using different programming languages and when it comes to R, I don't know why I can't get this into my head. Is there someone who can explain in plain english an example of this simple code? So for example:
above <- function(x, n){
use <- x > n
x[use]
}
x <- 1:20
above(x, 12)
## [1] 13 14 15 16 17 18 19 20
I'm trying to understand what's going on in this code but simply don't. As a result, I could never just write this code on my own because I don't have the language in my head that explains what is happening with this. I get stuck at the first line:
above <- function(x, n) {
Can someone just explain this code sample in plain English so I have some kind of context for understanding what I'm looking at and why I'm doing what I'm doing in this code? And what I mean by plain English is, walking through the code, step by step and not just repeating the official terms from R like vector and function and array and all these other things, but telling me, in a common sense way, what this means.
Since your background ( phd in statsitics) the best way to understand this
is in mathematics words.
Mathematically speaking , you are defining a parametric function named above that extracts all element from a vector x above a certain value n. You are just filtering the set or the vector x.
In sets notation you can write something like :
above:{x,n} --> {y in x ; y>n}
Now, Going through the code and paraphrasing it (in the left the Math side , in the right its equivalent in R):
Math R
---------------- ---------------------
above: (x,n) <---> above <- function(x, n)
{y in x ; y>n} <---> x[x > n]
So to wrap all the statments together within a function you should respect a syntax :
function_name <- function(arg1,arg2) { statements}
Applying the above to this example (we have one statement here) :
above <- function(x,n) { x[x>n]}
Finally calling this function is exactly the same thing as calling a mathematical function.
above(x,2)
ok I will try, if this is too detailed let me know, but I tried to go really slowly:
above <- function(x, n)
this defines a function, which is just some procedure which produces some output given some input, the <- means assign what is on the right hand side to what is on the left hand side, or in other words put everything on the right into the object on the left, so for example container <- 1 puts 1 into the container, in this case we put a function inside the object above,
function(x, n) everything in the paranthesis specifys what inputs the function takes, so this one takes two variables x and n,
now we come to the body of the function which defines what it does with the inputs x and n, the body of the function is everything inside the curley braces:
{
use <- x > n
x[use]
}
so let's explain that piece by piece:
use <- x > n
this part again puts whats on the right side into the object on the left, and what is happening on the right hand side? a comparison returning TRUE if x is bigger than n and FALSE if x is equal to or smaller then n, so if x is 5 and n is 3 the result will be TRUE, and this value will get stored inside use, so use contains TRUE now, now if we have more than one value inside x than every value inside x will get compared to n, so for example if x = [1, 2, 3] and n = 2
than we have
1 > 2 FALSE
2 > 2 FALSE
3 > 2 TRUE
, so use will contain FALSE, FALSE, TRUE
x[use]
now we are taking a part of x, the square brackets specify which parts of x we want, so in my example case x has 3 elements and use has 3 elements if we combine them we have:
x use
1 FALSE
2 FALSE
3 TRUE
so now we say I dont want 1,2 but i want 3 and the result is 3
so now we have defined the function, now we call it, or in normal words we use it:
x <- 1:20
above(x, 12)
first we assign the numbers 1 through 20 to x, and then we tell the function above to execute (do everything inside its curley braces with the inputs x = 1:20 and n = 12, so in other words we do the following:
above(x, 12)
execute the function above with the inputs x = 1:20 and n = 12
use <- 1:20 > 12
compare 12 to every number from 1:20 and return for each comparison TRUE if the number is in fact bigger than 12 and FALSE if otherwise, than store all the results inside use
x[use]
now give me the corresponding elements of x for which the vector use contains TRUE
so:
x use
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
8 FALSE
9 FALSE
10 FALSE
11 FALSE
12 FALSE
13 TRUE
14 TRUE
15 TRUE
16 TRUE
17 TRUE
18 TRUE
19 TRUE
20 TRUE
so we get the numbers 13:20 back as a result
I'll give it a crack too. A few basic points that should get you going in the right direction.
1) The idea of a function. Basically, a function is reusable code. Say I know that in my analysis for some bizarre reason I will often want to add two numbers, multiply them by a third, and divide them by a fourth. (Just suspend disbelief here.) So one way I could do that would just be to write the operation over and over, as follows:
(75 + 93)*4/18
(847 + 3)*3.1415/2.7182
(999 + 380302)*-6901834529/2.5
But that's tedious and error-prone. (What happens if I forget a parenthesis?) Alternatively, I can just define a function that takes whatever numbers I feed into it and carries out the operation. In R:
stupidMath <- function(a, b, c, d){
result <- (a + b)*c/d
}
That code says "I'd like to store this series of commands and attach them to the name "stupidMath." That's called defining a function, and when you define a function, the series of commands is just stored in memory---it doesn't actually do anything until you "call" it. "Calling" it is just ordering it to run, and when you do so, you give it "arguments" ---the stuff in the parentheses in the first line are the arguments it expects, i.e., in my example, it wants four distinct pieces of data, which will be called 'a', 'b', 'c', and 'd'.
Then it'll do the things it's supposed to do with whatever you give it. "The things it's supposed to do" is the stuff in the curly brackets {} --- that's the "body" of the function, which describes what to do with the arguments you give it. So now, whenever you want to carry that mathematical operation you can just "call" the function. To do the first computation, for example, you'd just write stupidMath(75, 93, 4, 18) Then the function gets executed, treating 75 as 'a', 83 as 'b', and so forth.
In your example, the function is named "above" and it takes two arguments, denoted 'x' and 'n'.
2) The "assignment operator": R is unique among major programming languages in using <- -- that's equivalent to = in most other languages, i.e., it says "the name on the left has the value on the right." Conceptually, it's just like how a variable in algebra works.
3) so the "body" of the function (the stuff in the curly brackets) first assigns the name "use" to the expression x > n. What's going on there. Well, an expression is something that the computer evaluates to get data. So remember that when you call the function, you give it values for x and n. The first thing this function does is figures out whether x is greater than n or less than n. If it's greater than n, it evaluates the expression x > n as TRUE. Otherwise, FALSE.
So if you were to define the function in your example and then call it with above(10, 5), then the first line of the body would set the local variable (don't worry right now about what a 'local' variable is) 'use' to be 'TRUE'. This is a boolean value.
Then the next line of the function is a "filter." Filtering is a long topic in R, but basically, R things of everything as a "vector," that is, a bunch of pieces of data in a row. A vector in R can be like a vector in linear algebra, i.e., (1, 2, 3, 4, 5, 99) is a vector, but it can also be of stuff other than numbers. For now let's just focus on numbers.
The wacky thing about R (one of the many wacky things about R) is that it treats a single number (a "scalar" in linear algebra terms) just as a vector with only one item in it.
Ok, so why did I just go into that? Because in lots of places in R, a vector and a scalar are interchangable.
So in your example code, instead of giving a scalar for the first argument, when we call the function we've given 'above' a vector for its first argument. R likes vectors. R really likes vectors. (Just talk to R people for a while. They're all obsessed with doing every goddmamn thing in terms of a vector.) So it's no problem to pass a vector for the first argument. But what that means is that the variable 'use' is going to be a vector too. Specifically, 'use' is going to be a vector of booleans, i.e., of TRUE or FALSE for each individual value of X.
To take a simpler version: suppose you said:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
when the code runs, the first thing it's going to do is define that 'use' variable. But x is a vector now, not a scalar (the c(5,10) code said "make a vector with two elements, and fill them with the numbers '5' and '10'), so R's going to go ahead and carry out the comparison for each element of x. Since 5 is less than 7 and 10 is greater than 7, use becomes the two item-vector of boolean values (FALSE, TRUE)
Ok, now we can talk about filtering. So a vector of boolean values is called a 'logical vector.' And the code x[use] says "filter x by the stuff in the variable use." When you tell R to filter something by a logical vector, it spits back out the elements of the thing being filtered which correspond to the values of 'TRUE'
So in the example just given:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
the value of myresult will just be 10. Why? Because the function filtered 'x' by the logical vector 'use,' 'x' was (5, 10), and 'use' was (FALSE, TRUE); since the second element of the logical was the only true, you only got the second element of x.
And that gets assigned to the variable myresult because myresult <- above(mynums, 7) means "assign the name myresult to the value of above(mynums, 7)"
voila.

Resources