Why does isTRUE improve this integer comparison - r

I just started with R and I've come across an issue I can fix but not quite understand.
Consider this simple code:
foo <- function(v) {
for(i in 1:length(v)-1)
if(v[i] > v[i+1])
#do something here
return()
}
v <- c(10, 40, 40, 10, 20, 70, 30, 20)
foo(v)
Running it will give this error:
Error in if (v[i] > v[i + 1]) return() : argument is of length zero
But replacing the if with the following code gets rid of the error:
if(isTRUE(v[i] > v[i+1]))
I come from a C/Java background so my question is, why? Why does this simple integer comparison need to be wrapped in isTRUE to work?
On similar questions I've found that isTRUE helps protect against cases where one of the two arguments is NA or NULL, but why is this the case here with two numbers?

1:length(v)-1 is intepreted as (1:length(v))-1. In R arrays start at 1. You should instead have 1:(length(v)-1):
> length(v)
[1] 8
> 1:length(v)-1
[1] 0 1 2 3 4 5 6 7
> 1:(length(v)-1)
[1] 1 2 3 4 5 6 7
> v[9]
[1] NA
Complete function:
foo <- function(v) {
for(i in 1:(length(v)-1))
{
if(v[i] > v[i+1])
{
#do something here
}
}
return()
}
v <- c(10, 40, 40, 10, 20, 70, 30, 20)
> foo(v)
# NULL
isTRUE(x) returns TRUE if, and only if, x is TRUE. This means that:
isTRUE(NA)
[1] FALSE
However:
> NA == TRUE
[1] NA
(not FALSE)

Related

Does `gmp` library disregard index ordering in `[<-`?

Consider these two examples:
> foo <- 1:5
> foo2 <- c(10,20)
> foo[3:2] <- foo2
> foo
[1] 1 20 10 4 5
> bar <- as.bigz(1:5)
> bar2 <- as.bigz(c(10,20))
> bar[3:2] <- bar2
> bar
Big Integer ('bigz') object of length 5:
[1] 1 10 20 4 5
Am I missing something in how bigz objects are indexed, or is this a bug in the library?
Added: gmp 0.6.5 and R-windows 4.2.0 .
This is likely irrelevant but viewed within a debug context on [
a_5 <- as.bigz(c(5,4,3,2,1))
a_5[2:3]
debugging in: `[.bigz`(a_5, 2:3)
debug at /home/chris/r_TMPDIR/RtmpVMOZqo/R.INSTALL4bd8b27ba6a92/gmp/R/biginteger.R#480: {
mdrop <- missing(drop)
Narg <- nargs() - (!mdrop)
matrixAccess = Narg > 2
has.j <- !missing(j)
if (!is.null(attr(x, "nrow")) & matrixAccess) {
.Call(matrix_get_at_z, x, i, j)
}
else {
if (has.j)
stop("invalid vector subsetting")
r <- .Call(biginteger_get_at, x, i)
attr(r, "nrow") <- NULL
r
}
}
Browse[2]> Q
so we start in the context of debugging in: [.bigz(a_5, 2:3),
our first [, whereas:
bar = as.bigz(c(300, 400))
a_5[2:3] <- bar[2:1]
debugging in: `[.bigz`(bar, 2:1)
debug at /home/chris/r_TMPDIR/RtmpVMOZqo/R.INSTALL4bd8b27ba6a92/gmp/R/biginteger.R#480: {
mdrop <- missing(drop)
Narg <- nargs() - (!mdrop)
matrixAccess = Narg > 2
has.j <- !missing(j)
if (!is.null(attr(x, "nrow")) & matrixAccess) {
.Call(matrix_get_at_z, x, i, j)
}
else {
if (has.j)
stop("invalid vector subsetting")
r <- .Call(biginteger_get_at, x, i)
attr(r, "nrow") <- NULL
r
}
}
and we're on RHS debugging in: [.bigz(bar, 2:1), contrary to our expectation as would be with normal values:
a_norm <- c(5,4,3,2,1)
b_nums <- c(200, 300)
a_norm[2:3] = b_nums[2:1]
a_norm
[1] 5 300 200 2 1
where both the LHS 'where we want it' and RHS 'what it is' occur in a one liner. Note, I couldn't get debug to trigger in this instance.
Feature or flaw?
I received confirmation from one of the authors (M Maechler) that this is one of several bugs in gmp related to indexing and subsetting. There's also problems when attempting to run certain apply functions, BTW. We'll just have to wait for the next version to be released.

Retaining maximum of a vector into another vector in R

I want to retain the maximum value in a vector. R code is written below.
How to fix this code so it runs without errors?
dat is in a data frame
dat=c(3, 5, 4, 2, 8, NA, NA, 9, 10, 3)
desired output is
MaxRuns=c(3,5,5,5,8,8,8,9,10,10)
maxValue=function(dat){
maxv=0
for (i in 1:10) MaxRuns(i)=0
for (i in 1:10){
if dat(i) > maxv {
maxv=dat(i) }
MaxRuns(i)=maxv
}
return(maxv)
}
maxValue<-maxValue(dat)
maxValue
Errors:
dat=c(3, 5, 4, 2, 8, NA, NA, 9, 10, 3)
> maxValue=function(dat){
+ maxv=0
+ for (i in 1:10) MaxRuns(i)=0
+ for (i in 1:10){
+ if dat(i) > maxv {
Error: unexpected symbol in:
" for (i in 1:10){
if dat"
> maxv=dat(i) }
Error: unexpected '}' in " maxv=dat(i) }"
> MaxRuns(i)=maxv
Error: object 'maxv' not found
> }
Error: unexpected '}' in " }"
> return(maxv)
Error: object 'maxv' not found
> }
Error: unexpected '}' in " }"
> maxValue<-maxValue(dat)
Error in maxValue(dat) : could not find function "maxValue"
> maxValue
Error: object 'maxValue' not found
Thank you. MM
This looks like cummax but you need to handle NAs. As dat is completely positive replacing NAs with 0 here.
cummax(replace(dat, is.na(dat), 0))
#[1] 3 5 5 5 8 8 8 9 10 10
As mentioned by #Dason, replacing the NA values with min would make it general
cummax(replace(dat, is.na(dat), min(dat, na.rm = TRUE)))
You can access each element of vector using square brackets ([]) and not round brackets (()). I would write a loop something like this.
maxv = integer(length = length(dat))
current_max = 0
for (i in seq_along(dat)) {
if (dat[i] > current_max & !is.na(dat[i])){
current_max <- dat[i]
}
maxv[i] <- current_max
}
maxv
#[1] 3 5 5 5 8 8 8 9 10 10
There are a few things to fix.
You can check if an entry is NA by using is.na. Also, when you access the entries of dat, use dat[i] rather than dat(i). Also, try not to name a varible the variable the same name as a function's name.
dat=c(3, 5, 4, 2, 8, NA, NA, 9, 10, 3)
maxValue=function(dat){
maxv=0
MaxRuns = rep(0, 10)
for (i in 1:10){
if (!is.na(dat[i]) && dat[i] > maxv){
maxv=dat[i] }
MaxRuns[i]=maxv
}
return(MaxRuns)
}
maxRuns<-maxValue(dat)
print(maxRuns)
prints out
[1] 3 5 5 5 8 8 8 9 10 10

Shuffling and combining two vectors [duplicate]

I would like to merge 2 vectors this way :
a = c(1,2,3)
b = c(11,12,13)
merged vector : c(1,11,2,12,3,13)
How could I do it ?
This will work using rbind :
c(rbind(a, b))
For example:
a = c(1,2,3)
b = c(11,12,13)
c(rbind(a,b))
#[1] 1 11 2 12 3 13
Explanation
This works because R stores arrays in column-major order.
When you rbind() the two vectors, you get:
rbind_result <- rbind(a, b)
rbind_result
# [,1] [,2] [,3]
# a 1 2 3
# b 11 12 13
Then c() coerces rbind_result into a column-wise flattened vector:
merged <- c(rbind_result)
merged
# [1] 1 11 2 12 3 13
The rbind() answer by #jalapic is excellent. Here's an alternative that creates a new vector then assigns the alternating values to it.
a <- c(1,2,3)
b <- c(11,12,13)
x <- vector(class(a), length(c(a, b)))
x[c(TRUE, FALSE)] <- a
x[c(FALSE, TRUE)] <- b
x
# [1] 1 11 2 12 3 13
And one more that shows append
c(sapply(seq_along(a), function(i) append(a[i], b[i], i)))
# [1] 1 11 2 12 3 13
Just wanted to add a simpler solution that works for when vectors are unequal length and you want to append the extra data to the end.
> a <- 1:3
> b <- 11:17
> c(a, b)[order(c(seq_along(a)*2 - 1, seq_along(b)*2))]
[1] 1 11 2 12 3 13 14 15 16 17
Explanation:
c(a, b) creates a vector of the values in a and b.
seq_along(a)*2 - 1 creates a vector of the first length(a) odd numbers.
seq_along(b)*2 creates a vector of the first length(b) even numbers.
order(...) will return the indexes of the numbers in the two seq_along vectors such that x[order(x)] is an ordered list. Since the first seq_along contains the even numbers and the second seq_along has the odds, order will take the first element from the first seq_along, then the first elements of the second seq_along, then the second element from the first seq_along, etc. interspersing the two vector indexes and leaving the extra data at the tail.
By indexing c(a, b) using the order vector, we will intersperse a and b.
As a note, since seq_along returns numeric(0) when the input is NULL this solution works even if one of the vectors is length 0.
I had to solve a similar problem, but my vectors were of unequal length. And, I didn't want to recycle the shorter vector, but just append the tail of the longer vector.
And the solution for #RichardScriven didn't work for me (though I may have done something wrong and didn't try hard to troubleshoot).
Here is my solution:
#' Riffle-merges two vectors, possibly of different lengths
#'
#' Takes two vectors and interleaves the elements. If one vector is longer than
#' the other, it appends on the tail of the longer vector to the output vector.
#' #param a First vector
#' #param b Second vector
#' #return Interleaved vector as described above.
#' #author Matt Pettis
riffle <- function(a, b) {
len_a <- length(a)
len_b <- length(b)
len_comm <- pmin(len_a, len_b)
len_tail <- abs(len_a - len_b)
if (len_a < 1) stop("First vector has length less than 1")
if (len_b < 1) stop("Second vector has length less than 1")
riffle_common <- c(rbind(a[1:len_comm], b[1:len_comm]))
if (len_tail == 0) return(riffle_common)
if (len_a > len_b) {
return(c(riffle_common, a[(len_comm + 1):len_a]))
} else {
return(c(riffle_common, b[(len_comm + 1):len_b]))
}
}
# Try it out
riffle(1:7, 11:13)
[1] 1 11 2 12 3 13 4 5 6 7
riffle(1:3, 11:17)
[1] 1 11 2 12 3 13 14 15 16 17
HTH,
Matt
#MBo's answer to my question at https://stackoverflow.com/a/58773002/2556061 implies a solution for evenly interlacing vectors of unequal length. I'm reporting it here in for reference.
interleave <- function(x, y)
{
m <- length(x)
n <- length(y)
xi <- yi <- 1
len <- m + n
err <- len %/% 2
res <- vector()
for (i in 1:len)
{
err <- err - m
if (err < 0)
{
res[i] <- x[xi]
xi <- xi + 1
err <- err + len
} else
{
res[i] <- y[yi]
yi <- yi + 1
}
}
res
}
gives
interleave(1:10, 100:120)
c(100, 1, 101, 102, 2, 103, 104, 3, 105, 106, 4, 107, 108, 5, 109, 110, 111, 6, 112, 113, 7, 114, 115, 8, 116, 117, 9, 118, 119, 10, 120)
A tidyverse approach is vctrs::vec_interleave:
vctrs::vec_interleave(a, b)
#[1] 1 11 2 12 3 13

lapply needs boolean after if-statement condition

I'm new to R. I wrote a function that applies to numbers and want to apply it to a numeric of length 400. It goes
EGIDS.to.IUCN <- function(x){
if(x==10){return(NA)} # 10 (Extinct)
if(x==9){return(NA)} # 9 (Dormant)
if(x==8.5){return(4)} # 8.5 (Nearly Extinct) → 4 (Critically endangered)
# 10 more similar lines here (no more NAs)
else{stop}
}
I tried using lapply but then I get
> austroIUCN <- lapply(austroEGIDS, EGIDS.to.IUCN)
Error in if (x == 10) { : missing value where TRUE/FALSE needed
Where austroEGIDS is a list of 400 numbers from 0 to 10. I'm totally lost here. Why does it expect a boolean after closing the if condition?
It would be more efficient if you use a numeric vector and work with vectorized statements:
austroIUCN <- unlist(austroEGIDS)
austroIUCN[austroIUCN==10 | austroIUCN==9] <- NA
austroIUCN[austroIUCN==8.5] <- 4
...
Each statements sets all entries with the given level.
Without the stop this should work,
EGIDS.to.IUCN <- function(x) {
if (is.na(x)){ NA } else
if (x == 10) { NA } else
if (x == 9) { NA } else
if(x == 8.5) { 4 } else
NA
}
or, more readable and faster,
EGIDS.to.IUCN <- function(x){
switch (x, 'NA'=NA, '10'=NA, '9'=NA, '8.5'=4, NA)
}
austroEGIDS <- sample(seq(1, 10, .5), 400, replace = TRUE)
austroIUCN <- sapply(austroEGIDS, EGIDS.to.IUCN)
table(unlist(austroIUCN), useNA = "ifany")
austroIUCN
4 <NA>
23 377
Or if you want it to stop and throw an error if not a match,
EGIDS.to.IUCN <- function(x){
switch (x, 'NA'=NA, '10'=NA, '9'=NA, '8.5'=4, stop("Not a match!"))
}

How could I make this R snippet faster and more R-ish?

Coming from various other languages, I find R powerful and intuitive, but I am not thrilled with its performance. So I decided to try to improve some snippet I wrote and learn how to code better in R.
Here's a function I wrote, trying to determine if a vector is binary-valued (two distinct values or just one value) or not:
isBinaryVector <- function(v) {
if (length(v) == 0) {
return (c(0, 1))
}
a <- v[1]
b <- a
lapply(v, function(x) { if (x != a && x != b) {if (a != b) { return (c()) } else { b = x }}})
if (a < b) {
return (c(a, b))
} else {
return (c(b, a))
}
}
EDIT: This function is expected to look through a vector then return c() if it is not binary-valued, and return c(a, b) if it is, a being the small value and b being the larger one (if a == b then just c(a, a). E.g., for
A B C
1 1 1 0
2 2 2 0
3 3 1 0
I will lapply this isBinaryVector and get:
$A
[1] 1 1
$B
[1] 1 1
$C
[1] 0 0
The time it took on a moderate sized dataset (about 1800 * 3500, 2/3 of them are binary-valued) is about 15 seconds. The set contains only floating-point numbers.
Is there anyway I could do this faster?
Thanks for any inputs!
You are essentially trying to write a function that returns TRUE if a vector has exactly two unique values, and FALSE otherwise.
Try this:
> dat <- data.frame(
+ A = 1:3,
+ B = c(1, 2, 1),
+ C = 0
+ )
>
> sapply(dat, function(x)length(unique(x))==2)
A B C
FALSE TRUE FALSE
Next, you want to get the min and max value. The function range does this. So:
> sapply(dat, range)
A B C
[1,] 1 1 0
[2,] 3 2 0
And there you have all the ingredients to make a small function that is easy to understand and should be extremely quick, even on large amounts of data:
isBinary <- function(x)length(unique(x))==2
binaryValues <- function(x){
if(isBinary(x)) range(x) else NA
}
sapply(dat, binaryValues)
$A
[1] NA
$B
[1] 1 2
$C
[1] NA
This function returns true or false for vectors (or columns of a data frame):
is.binary <- function(v) {
x <- unique(v)
length(x) - sum(is.na(x)) == 2L
}
Also take a look at this post
I'd use something like that to get column indicies:
bivalued <- apply(my.data.frame, 2, is.binary)
nominal <- my.data.frame[,!bivalued]
binary <- my.data.frame[,bivalued]
Sample data:
my.data.frame <- data.frame(c(0,1), rnorm(100), c(5, 19), letters[1:5], c('a', 'b'))
> apply(my.data.frame, 2, is.binary)
c.0..1. rnorm.100. c.5..19. letters.1.5. c..a....b..
TRUE FALSE TRUE FALSE TRUE

Resources