Unexpected result in R for loop - r

I don't know why this isn't giving me the desired results.
Here is my vector:
flowers = c("Flower", "Flower", "Vegatative", "Vegatative", "Dead")
Here is my for loop:
Na = 0
for (i in 1:length(flowers)){
if (i != "Dead"){
Na = Na + 1
}
}
Na
Obviously Na should equal 4, but it gives me a result of 5. When I print the flower's status it prints all 5. I don't want it to read the last one. What's my problem?
Thank you.

You seem to be trying to count the number of values in flowers that are not equal to "Dead". In R, the way to do this would be:
sum(flowers != "Dead")
# [1] 4

The bug in your code is this line:
if (i != "Dead"){
To understand why, it would be best to print out the values of i in the loop:
for (i in 1:length(flowers)){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
That is, you are iterating over numbers (indices of a vector), but not actually selecting the value from the vector when you do the test. To access the values, use flowers[i]:
for (i in 1:length(flowers)){
print(flowers[i])
}
[1] "Flower"
[1] "Flower"
[1] "Vegatative"
[1] "Vegatative"
[1] "Dead"
And so, the answer to your original question is this:
Na = 0
for (i in 1:length(flowers)){
if (flowers[i] != "Dead"){
Na = Na + 1
}
}
Na
[4]
R offers a lot of facilities for doing computations like this without a loop - it's called vectorization. A good article on it is John Cook's 5 Kinds of Subscripts in R. For example, you could get the same result like this:
length(flowers[flowers != "Dead"])
[1] 4

Related

What is the principle behind the code of getting "for loop" output

I'm quite confused about the principle of the code getting the result returned from running the for loop, which is x=c(x,i)
> x<-c()
for(i in 1:5){
x=c(x,i)
}
> x
[1] 1 2 3 4 5
As what I have understood about for loop, I thought x=i would return the expected result, in this case it would be 1 2 3 4 5, but the return is 5, only showing result of the last round of loop, I'm wondering why x=c(x,i) can collect the result from each round of the loop? what the relationships between x, x in () and i? Like what is the process of value assgining between them?
Hope someone can explain it. Thank you sooo much!!!
> x<-c()
for(i in 1:5){
x=i
}
> x
[1] 5
x=c(x,i) is collecting the data because of the function c which concatenates each new value i to the previously existing vector x.
If you want to get more insights as to what's going on inside the loop, you can use print(x), which will display the value of x at each iteration of the loop.
x<-c()
for(i in 1:5){
x=c(x,i)
print(x)
}
# [1] 1
# [1] 1 2
# [1] 1 2 3
# [1] 1 2 3 4
# [1] 1 2 3 4 5
At each iteration, x is being updated with a new value i. Without c, the previous values of x would be deleted from the vector x, as shown below.
x<-c()
for(i in 1:5){
x=i
print(x)
}
# [1] 1
# [1] 2
# [1] 3
# [1] 4
# [1] 5
As #user2554330 pointed out in the comments, it is easier to think about it when using <- instead of =, as c(x,i) is being stored into a new vector x. x is thus being overwritten at each iteration, which is why you get a different result with x = i.

Weird behavior when trying to sort vectors in a list using a loop

I want to sort vectors in a list. I tried the following:
test <- list(c(2,3,1), c(3,2,1), c(1,2,3))
for (i in length(test)){
test[[i]] <- sort(test[[i]])
}
test
Which returns the list unchanged (vectors not sorted):
[[1]]
[1] 2 3 1
[[2]]
[1] 3 2 1
[[3]]
[1] 1 2 3
However when I sort manually outside the loop the order is stored:
test[[1]]
[1] 2 3 1
test[[1]] <- sort(test[[1]])
test[[1]]
[1] 1 2 3
Why does the behaviour in the loop differ? I would expect the loop to store three vectors c(1,2,3) in the list. What am I missing?
I just figured the loop only loops over one element since length(test) = 3. Hence I should have used for (i in 1:length(test)).

sapply doesn't return (numeric) vector when calculating gradients

I am calculating gradient values by using
DF$gradUx <- sapply(1:nrow(DF), function(i) ((DF$V4[i+1])-DF$V4[i]), simplify = "vector")
but when checking class(DF$gradUx), I still get a list. What I want is a numeric vector. What am I doing wrong?
Browse[1]> head(DF)
V1 V2 V3 V4
1 0 0 -2.913692e-09 2.913685e-09
2 1 0 1.574589e-05 3.443367e-09
3 2 0 2.111406e-05 3.520451e-09
4 3 0 2.496275e-05 3.613013e-09
5 4 0 2.735775e-05 3.720385e-09
6 5 0 2.892444e-05 3.841937e-09
You will only get a numeric vector when all return values are of length 1. More accurately, you will get an array if all return values are the same length. From ?sapply "Details":
Simplification in 'sapply' is only attempted if 'X' has length
greater than zero and if the return values from all elements of
'X' are all of the same (positive) length. If the common length
is one the result is a vector, and if greater than one is a matrix
with a column corresponding to each element of 'X'.
When i == 0, your formula will return numeric(0), so the whole return will be a list.
You need to change your calculation to account for indexing outside the bounds of your vector. DF$V4[1-1] returns numeric(0), and DF$V4[nrow(DF)+1] returns NA. Fix this logic and you should remedy the vector problem.
Edit: for historical reasons, the original question incorrectly calculated the difference as DF$V4[i+1])-DF$V4[i-1], giving a lag-2 difference, whereas the recently-edited question (and the OP's intent) shows a lag-1 difference.
Instead of sapply I should simply use diff(DF$V3) and write it into a new data.frame:
gradients = data.frame(gradUx=diff(DF$V3),gradUy=diff(DF$V4))
This calculation can be vectorized very easily if you line up the observations. I use head and tail to drop the first 2 and last 2 observations:
gradUx <- c(NA, tail(df$V4, -2) - head(df$V4, -2), NA)
> gradUx
[1] NA 6.06766e-10 1.69646e-10 1.99934e-10 2.28924e-10 NA
Which provides the same values as your approach, in vector form:
> sapply(1:nrow(df), function(i) ((df$V4[i+1])-df$V4[i-1]), simplify = "vector")
[[1]]
numeric(0)
[[2]]
[1] 6.06766e-10
[[3]]
[1] 1.69646e-10
[[4]]
[1] 1.99934e-10
[[5]]
[1] 2.28924e-10
[[6]]
[1] NA

How do you determine which element in a list contains a value matching some other value?

If I have the following list:
a <- list(1:3, 4:5, 6:9)
a
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5
[[3]]
[1] 6 7 8 9
I want to determine which element of the list a specific value is in. For example, I might want to find which element the number 5 falls under. In this case it would be [[2]].
My goal is to have something like
match(5,a)
return the value 2.
However, this code only checks whether the selected number exists as a complete element of a given element
match(5,a)
[1] NA
Further, unlist only tells me where in the entire length of all values my number of interest falls:
match(5,unlist(a))
[1] 5
Thoughts?
You can use grep function
grep(5, a)
# [1] 2
grep(9, a)
# [1] 3
Updated Answer
After reading #nicola 's comment came to know that grep command works only for the numbers that belong to start and end of the list and not for the numbers that are in between.
You can try the below mentioned code for the complete solution,
a <- list(1:3, 4:5, 6:9)
df <- data.frame(unlist(a))
df$group <- 0
k <- 1
i<-0
for(i in 1:length(a))
{
x[i] <- length(unlist(a[i]))
for(j in 1:x[i])
{
df$group[k] <- i
k <- k+1
}
}
colnames(df)[1] <- "num"
df[df$num == 5, ]$group
# [1] 2
> df[df$num == 9, ]$group
#[1] 3
df[df$num == 8, ]$group
# [1] 3

counting vectors with NA included

By mistake, I found that R count vector with NA included in an interesting way:
> temp <- c(NA,NA,NA,1) # 4 items
> length(temp[temp>1])
[1] 3
> temp <- c(NA,NA,1) # 3 items
> length(temp[temp>1])
[1] 2
At first I assume R will process all NAs into one NA, but this is not the case.
Can anyone explain? Thanks.
You were expecting only TRUE's and FALSE's (and the results to only be FALSE) but a logical vector can also have NA's. If you were hoping for a length zero result, then you had at least three other choices:
> temp <- c(NA,NA,NA,1) # 4 items
> length(temp[ which(temp>1) ] )
[1] 0
> temp <- c(NA,NA,NA,1) # 4 items
> length(subset( temp, temp>1) )
[1] 0
> temp <- c(NA,NA,NA,1) # 4 items
> length( temp[ !is.na(temp) & temp>1 ] )
[1] 0
You will find the last form in a lot of the internal code of well established functions. I happen to think the first version is more economical and easier to read, but the R Core seems to disagree. I have several times been advised on R help not to use which() around logical expressions. I remain unconvinced. It is correct that one should not combine it with negative indexing.
EDIT The reason not to use the construct "minus which" (negative indexing with which) is that in the case where all the items fail the which-test and where you would therefore expect all of them to be returned , it returns an unexpected empty vector:
temp <- c(1,2,3,4,NA)
temp[!temp > 5]
#[1] 1 2 3 4 NA As expected
temp[-which(temp > 5)]
#numeric(0) Not as expected
temp[!temp > 5 & !is.na(temp)]
#[1] 1 2 3 4 A correct way to handle negation
I admit that the notion that NA's should select NA elements seems a bit odd, but it is rooted in the history of S and therefore R. There is a section in ?"[" about "NA's in indexing". The rationale is that each NA as an index should return an unknown result, i.e. another NA.
If you break down each command and look at the output, it's more enlightening:
> tmp = c(NA, NA, 1)
> tmp > 1
[1] NA NA FALSE
> tmp[tmp > 1]
[1] NA NA
So, when we next perform length(tmp[tmp > 1]), it's as if we're executing length(c(NA,NA)). It is fine to have a vector full of NAs - it has a fixed length (as if we'd created it via NA * vector(length = 2), which should be different from NA * vector(length = 3).
You can use 'sum':
> tmp <- c(NA, NA, NA, 3)
> sum(tmp > 1)
[1] NA
> sum(tmp > 1, na.rm=TRUE)
[1] 1
A bit of explanation: 'sum' expects numbers but 'tmp > 1' is logical. So it is automatically coerced to be numeric: TRUE => 1; FALSE => 0; NA => NA.
I don't think there is anything precisely like this in 'The R Inferno' but this is definitely the sort of question that it is aimed at. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Resources