So I know to determine the first occurrence of a specific element in each row you use the apply function with which.max or which.min. Here is the code that I am using right now.
x <- matrix(c(20,9,4,16,6,2,14,3,1),nrow=3)
x
apply(3 >= x,1,which.max )
This produces and output of:
[1] 1 3 2
Now when I try to do the same thing on a different matrix "x2"
x2 <- matrix(c(3,9,4,16,6,2,14,3,1),nrow=3)
x2
apply(3 >= x2,1,which.max )
The output is the same;
[1] 1 3 2
But for "x2" it is correct because the "x2" matrix's first row does have a value less than or equal to three.
Now my question which is probably something simple is why do the apply functions produce the same thing for "x" and "x2". For "x" below I would want something like:
[1] 0 3 2
Or maybe even something like this:
[1] NA 3 2
I have seen questions on stack overflow before on which.max not producing NAs and the answer was to just use the which() function, but since I am using a matrix and I want the first occurrence I do not have that luxury... I think.
We could replace the values in 'x' that are >3 with a very small number, for e.g. -999 or the value that is lower than in the minimum value in the dataset. Get the index of the replaced vector with which.max and multiply with a logical index to take care of cases where there are only negative values. i.e. in the case of 'x', the first row is all greater than 3. So by replacing with -999, the which.max returns 1 as the index but we prefer to have it NA or 0. By using sum(x1>0, the first row will be '0' and negating (!), it converts to TRUE, negate once again and it returns FALSE. Multiplying the logical index coerces to binary (0/1) and we get the '0' value for the first case.
apply(x, 1, function(x) {x1 <- ifelse(x>3, -999, x)
which.max(x1)*(!!sum(x1>0))})
#[1] 0 3 2
apply(x2, 1, function(x) {x1 <- ifelse(x>3, -999, x)
which.max(x1)*(!!sum(x1>0))})
#[1] 1 3 2
Another option is using max.col
x1 <- replace(x, which(x>3), -999)
max.col(x1)*!!rowSums(x1>0)
#[1] 0 3 2
x2N <- replace(x2, which(x2>3), -999)
max.col(x2N)*!!rowSums(x2N>0)
#[1] 1 3 2
Or a slight modification would be
indx <- x*(x <=3)
max.col(indx)*!!rowSums(indx)
#[1] 0 3 2
Put a column in front of '(3>=x)' that is Infinity, if and only if all entries in the corresponding row of 'x' are larger than 3, and otherwise NaN. Then apply 'which.max' rowwise, and finally subtract 1, because of the extra column:
x <- matrix(c(20,9,4,16,6,2,14,3,1),nrow=3)
a <- (!apply(3>=x,1,max))*Inf
apply( cbind(a,3>=x), 1, which.max ) - 1
This gives '0,3,2' 'which.max' is applied to the extended matrix
> cbind(a,3>=x)
a
[1,] Inf 0 0 0
[2,] NaN 0 0 1
[3,] NaN 0 1 1
Related
In R, I saw that if we subset a negative number by negative values, we get -1. If somehow a 1 is placed in, we get numeric(0), and if positive numbers are the indices, we get NA's. Why is this?
> V <- -1
> V[-c(3,4)]
[1] -1
> V[-c(1,3,4)]
numeric(0)
> V[c(1,3,4)]
[1] -1 NA NA
In the second an third case, the actual index was present, and it results in removing that element to results in numeric(0) for the second case and in third with positive index, third and fourth doesn't exist and gives NA
c(1, 4, 3)[c(5, 6)] # // it is vector of length 3, so 5th and 6th doesn't exist
#[1] NA NA
c(1, 4, 3)[-c(5, 6)] # // no values in 5th and 6th to remove
#[1] 1 4 3 # // so it returns the original vector
In the OP's case
V[-1] # // returns numeric(0) as the first and only element is removed
#numeric(0)
Following is related to R language.
x1 <- c(1, 4, 3, NA, 7)
is.na(x1) <- which(x1 == 7)
I don't undertand, the LHS in last line gives you a vector of boolean and RHS is a value(index where x ==7, 5 in this case). So what does it mean to assign a boolean vector a value of 5?
is.na from the docs returns:
The default method for is.na applied to an atomic vector returns a logical vector of the same length as its argument x, containing TRUE for those elements marked NA or, for numeric or complex vectors, NaN, and FALSE otherwise.
Therefore, by making a logical vector(you're in essence saying wherever an index is TRUE, this should be an NA.
By "matching" these indices to the corresponding index from which, you're turning the latter into NAs wherever FALSE hence the change.
To put it in practice:
This is the output from is.na(x1):
is.na(x1)
[1] FALSE FALSE FALSE TRUE FALSE
The corresponding output from which(x==7):
which(x1 == 7)
[1] 5
Combining, the element at position 5 will now become an NA because it has been given the logical is.na() which returns TRUE
is.na(x1) <- which(x1 == 7)
x1
[1] 1 4 3 NA NA
The above turns the first index into an NA and appends two more NAs so as to make index 7 and NA.
This can be best seen by:
is.na(x1) <- c(1,7)
x1
[1] NA 4 3 NA 7 NA NA
Compare with this example from the docs:
(xx <- c(0:4))
is.na(xx) <- c(2, 4)
xx
[1] 0 NA 2 NA 4
From the above, it is clear that c(2,4) follows the original indices in xx hence the rest become NAs.
I have the following matrices :
> matrix <- matrix(c(1,3,4,NA,NA,NA,3,0,4,6,0,NA,2,NA,NA,2,0,1,0,0), nrow=5,ncol=4)
> n <- matrix(c(1,2,5,6,2),nrow=5,ncol=1)
As you can see, for each rows I have
multiple NAs - the number NAs is undefined
ONE single "0"
I would like to subset the 0 for the values of the n. Intended output below.
> output <- matrix(c(1, 3, 4,NA,NA,NA,3,5,4,6,1,NA,2,NA,NA,2,2,1,6,2), nrow=5,ncol=4)
I have tried the following
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
#does not give intended output, but subset locates the values i want to change
When used on my "real" data i get the following message :
Warning message: In m[subset] <- n : number of items to replace is not
a multiple of replacement length
Thanks
EDIT : added a row to the matrix, as my real life problem is with an unbalanced matrix. I am using Matrices and not DF here, because i think (not sure)that with very large datasets, R is quicker with large matrices rather than subsets of dataframes.
We can do this using
out1 <- matrix+n[row(matrix)]*(matrix==0)
identical(output, out1)
#[1] TRUE
It appears you want to replace the values by row, but subsetting is replacing the values by column (and maybe that's not a completely thorough explanation). Transposing the matrix will get the desired output:
matrix <- t(matrix)
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
matrix <- t(matrix)
setequal(output, matrix)
[1] TRUE
You can try this option with ifelse:
ifelse(matrix == 0, c(n) * (matrix == 0), matrix)
# [,1] [,2] [,3] [,4]
#[1,] 1 NA 1 2
#[2,] 3 NA NA 2
#[3,] 4 3 5 NA
#[4,] NA 6 NA 2
zero = matrix == 0
identical(ifelse(zero, c(n) * zero, matrix), output)
# [1] TRUE
I am calculating gradient values by using
DF$gradUx <- sapply(1:nrow(DF), function(i) ((DF$V4[i+1])-DF$V4[i]), simplify = "vector")
but when checking class(DF$gradUx), I still get a list. What I want is a numeric vector. What am I doing wrong?
Browse[1]> head(DF)
V1 V2 V3 V4
1 0 0 -2.913692e-09 2.913685e-09
2 1 0 1.574589e-05 3.443367e-09
3 2 0 2.111406e-05 3.520451e-09
4 3 0 2.496275e-05 3.613013e-09
5 4 0 2.735775e-05 3.720385e-09
6 5 0 2.892444e-05 3.841937e-09
You will only get a numeric vector when all return values are of length 1. More accurately, you will get an array if all return values are the same length. From ?sapply "Details":
Simplification in 'sapply' is only attempted if 'X' has length
greater than zero and if the return values from all elements of
'X' are all of the same (positive) length. If the common length
is one the result is a vector, and if greater than one is a matrix
with a column corresponding to each element of 'X'.
When i == 0, your formula will return numeric(0), so the whole return will be a list.
You need to change your calculation to account for indexing outside the bounds of your vector. DF$V4[1-1] returns numeric(0), and DF$V4[nrow(DF)+1] returns NA. Fix this logic and you should remedy the vector problem.
Edit: for historical reasons, the original question incorrectly calculated the difference as DF$V4[i+1])-DF$V4[i-1], giving a lag-2 difference, whereas the recently-edited question (and the OP's intent) shows a lag-1 difference.
Instead of sapply I should simply use diff(DF$V3) and write it into a new data.frame:
gradients = data.frame(gradUx=diff(DF$V3),gradUy=diff(DF$V4))
This calculation can be vectorized very easily if you line up the observations. I use head and tail to drop the first 2 and last 2 observations:
gradUx <- c(NA, tail(df$V4, -2) - head(df$V4, -2), NA)
> gradUx
[1] NA 6.06766e-10 1.69646e-10 1.99934e-10 2.28924e-10 NA
Which provides the same values as your approach, in vector form:
> sapply(1:nrow(df), function(i) ((df$V4[i+1])-df$V4[i-1]), simplify = "vector")
[[1]]
numeric(0)
[[2]]
[1] 6.06766e-10
[[3]]
[1] 1.69646e-10
[[4]]
[1] 1.99934e-10
[[5]]
[1] 2.28924e-10
[[6]]
[1] NA
I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.
If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10
You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)
Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10
Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.
Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C