x <- c(2,5,4,3,9,8,11,6)
count <- 0
for (val in x) {
if (val %% 2 == 0) {
count <- count + 1
}
} print(count)
# [1] 4
I do not get why it is 4 and not 5, could someone give me a hint?
So from x <-c(2,5,4,3,9,8,11,6), there are only four numbers in which the remainder will be 0 when divided by 2.
Knowing that, let's look at what each part of the code does. You are defining a variable count and assigning it the value of 0. Already from the beginning, count starts at 0. The next line is a for loop. What the loop does is go through each value of x (recall that x <-c(2,5,4,3,9,8,11,6)). Now, the if statement says that for each value of x, if the value is divisible by 2 and has no remainder, add 1 to count (which is why you have the line count <- count + 1, you are adding 1 to count, which started at zero, and reassigning the new value to count - think of it as rewriting the value of count, replacing it). Now there are four values in x that are divisible by 2 and have a remainder of zero, so doing the math: 0 + 4 = 4. You should get:
print(count)
[1] 4
Hope this helps explain every aspect of the code. I highly recommend you understand and read about for loops, if statements, and overall just any R basics. There are several tutorials online explaining all these components.
Related
In a simple recursion with first if expression true then 0. if steps in the recursion keeps going until that first expression is true, why isn't 0 always returned?
fun stepping (n : int, number : int) =
if number > n
then 0
else 1 + stepping (n, number + 1)
It seems like the function stepping should add one onto number until number > n and then always return 0. Instead, it returns the number of times you went through the recursion cycle until number becomes greater than n.
The above code tests good in SML and gave me what I wanted - the number of steps incrementing by 1 until input "number" is greater than the input "n". But manually walking through the recursion steps, it seems like the return should always be 0 when the incremented "number" > the input "n". What am I missing?
I think you're mistaking the result of the final call to stepping in the recursive chain (which will always be zero) as being the ultimate value returned by the expression, but that is not the case. It is actually part of a larger equation that makes up the overall returned value.
For example, if we look at how the expression gets built up as each recursive call is made when evaluating stepping(3, 1), you end up with...
result = stepping(3, 1)
result = 1 + stepping(3, 2)
result = 1 + 1 + stepping(3, 3)
result = 1 + 1 + 1 + stepping(3, 4)
result = 1 + 1 + 1 + 0
result = 3
Let's say that I'm going to give you some money this year, according to this scheme:
you get nothing on the first of January
on all other days, you get one dollar more than I would have given you the day before
How much would I have to pay you today, the 20th of February?
Fifty dollars or nothing at all?
If you follow the calendar backwards, you will eventually reach January 1st, where the payment is zero, so would you expect to get nothing?
To answer your immediate question: the function does always return 0 if the first condition is met – that is, if number > n.
However, if the first condition isn't met – number <= n – it does not return 0 but 1 + stepping (n, number + 1).
It works exactly like of you called a function with a different name; that function computes a value and then this function adds 1.
It's not like returning a value from inside a loop, such as (pseudocode)
while (true)
{
if number > n
return 0
else
number = number +1
}
which is perhaps what you're thinking about.
I want to find a way to determine if two or more continuously elements of a vector are equal.
For example, in vector x=c(1,1,1,2,3,1,3), the first, the second and the third element are equal.
With the following command, I can determine if a vector, say y, contains two or more continuously elements that are equal to 2 or 3
all(rle(y)$lengths[which( rle(y)$values==2 | rle(y)$values==3 )]==1)
Is there any other faster way?
EDIT
Let say we have the vector z=c(1,1,2,1,2,2,3,2,3,3).
I want a vector with three elements as output. The first element will refer to value 1, the second to 2 and the third one to 3. The values of the elements of the output vector will be equal to 1 if two or more continuously elements of z are the same for one value of 1,2,3 and 0 otherwise. So, the output for the vector z will be (1,1,1).
For the vector w=c(1,1,2,3,2,3,1) the output will be 1,0,0, since only for the value 1 there are two continuously elements, that is in the first and in the second position of w.
I'm not entirely sure if I'm understanding your question as it could be worded better. The first part just asks how you find if continuous elements in a vector are equal. The answer is to use the diff() function combined with a check for a difference of zero:
z <- c(1,1,2,1,2,2,3,2,3,3)
sort(unique(z[which(diff(z) == 0)]))
# [1] 1 2 3
w <- c(1,1,2,3,2,3,1)
sort(unique(w[which(diff(w) == 0)]))
# [1] 1
But your edit example seems to imply you are looking to see if there are repeated units in a vector, of which will only be the integers 1, 2, or 3. Your output will always be X, Y, Z, where
X is 1 if there is at least one "1" repeated, else 0
Y is 2 if there is at least one "2" repeated, else 0
Z is 3 if there is at least one "3" repeated, else 0
Is this correct?
If so, see the following
continuously <- function(x){
s <- sort(unique(x[which(diff(x) == 0)]))
output <- c(0,0,0)
output[s] <- s
return(output)
}
continuously(z)
# [1] 1 2 3
continuously(w)
# [1] 1 0 0
Assuming your series name is z=c(1,1,2,1,2,2,3,2,3,3) then you can do:
(unique(z[c(FALSE, diff(z) == 0)]) >= 0)+0 which will output to 1, 1, 1,
When you run the above command on your other sequenc:
w=c(1,1,2,3,2,3,1)
then (unique(w[c(FALSE, diff(w) == 0)]) >= 0)+0 will return to 1
You may also try this for an exact output like 1,1,1 or 1,0,0
(unique(z[c(FALSE, diff(z) == 0)]) == unique(z))+0 #1,1,1 for z and 1,0,0 for w
Logic:
diff command will take difference between corresponding second and prior items, since total differences will always 1 less than the number of items, I have added first item as FALSE. Then subsetted with your original sequences and for boolean comparison whether the difference returned is zero or not. Finally we convert them to 1s by asking if they are greater than or equal to 0 (To get series of 1s, you may also check it with some other conditions to get 1s).
Assuming your sequence doesn't have negative numbers.
I have a list of increasing year values that occasionally has breaks in it and I want to create a grouping value for each unbroken sequence. Think of a vector like this one (missing 2005,2011):
x <- c(2001,2002,2003,2004,2006,2007,2008,2009,2010,2013,2014,2015,2016)
I would like to produce an equal length vector that numbers every value in a run with the same index to end up with something like this.
[1] 1 1 1 1 2 2 2 2 2 3 3 3 3
I would like to do this using best R practices so I am trying to avoid falling back to a for loop but I am not sure how to get from Vector A to Vector B. Does anyone have any suggestions?
Some things I know I can do:
I can flag the record before or after a gap as true with an ifelse
I can get the index of when the counter should change by wrapping that in a which statement
This is the code to do each
ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE)
which(ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE))
I think there a couple solutions to this problem. One as d.b posted in the comment above that will produce a sequence that increments every time there is a break in the sequence.
cummax(c(1, diff(x)))
There is a similar solution that I chose to use with ifelse() flagging breaks and cumsum(). I chose this solution because additional information,like other vectors, can be included in the decision and diff seems to have problems with very erratic up and down values.
cumsum(ifelse(!is.na(lag(x)) & x == lag(x) + 1, FALSE, TRUE))
so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.
This is quite literally the first problem in Project Euler. I created these two algorithms to solve it, but they each yield different answers. Basically, the job is to write a program that sums all the products of 3 and 5 that are under 1000.
Here is the correct one:
divisors<-0
for (i in 1:999){
if ((i %% 3 == 0) || (i %% 5 == 0)){
divisors <- divisors+i
}
}
The answer it yields is 233168
Here is the wrong one:
divisors<-0
for (i in 1:999){
if (i %% 3 == 0){
divisors <- divisors + i
}
if (i %% 5 == 0){
divisors <- divisors + i
}
}
This gives the answer 266333
Can anyone tell me why these two give different answers? The first is correct, and obviously the simpler solution. But I want to know why the second one isn't correct.
EDIT: fudged the second answer on accident.
Because multiples of 15 will add i once in the first code sample and twice in the second code sample. Multiples of 15 are multiples of both 3 and 5.
To make them functionally identical, the second would have to be something like:
divisors<-0
for (i in 1:999) {
if (i %% 3 == 0) {
divisors <- divisors + i
} else {
if (i %% 5 == 0) {
divisors <- divisors + i
}
}
}
But, to be honest, your first sample seems far more logical to me.
As an aside (and moot now that you've edited it), I'm also guessing that your second output value of 26633 is a typo. Unless R wraps integers around at some point, I'd expect it to be more than the first example (such as the value 266333 which I get from a similar C program, so I'm assuming you accidentally left of a 3).
I don't know R very well, but right off the bat, I see a potential problem.
In your first code block, the if statement is true if either of the conditions are true. Your second block runs the if statement twice if both conditions are met.
Consider the number 15. In your first code block, the if statement will trigger once, but in the second, both if statements will trigger, which is probably not what you want.
I can tell you exactly why that's incorrect, conceptually.
Take the summation of all integers to 333 and multiply is by 3, you'll get x
Take the summation of all integers to 200 and multiply it by 5, you'll get y
Take the summation of all integers to 66 and multiply it by 15, you'll get z
x + y = 266333
x + y - z = 233168
15 is divisible by both 3 and 5. You've counted all multiples of 15 twice.