I was trying to use pmax function in my program. I had a data frame of numbers and I was trying to compare it with a single number. Output had NA's.
I figured out that data frame does not work with pmax so I changed data frame to matrix. And it worked. I was curious as to why data frame was returning NA. Is it something to do with recycling?
code:-
mat <- matrix(runif(500), nrow = 20, ncol = 5)
df <- as.data.frame(mat)
pmax(mat, .5) # No NA's
pmax(df, .5) # Many NA's
It is one of the problems where the replication of the value on the second argument is not recycled fully, i.e. it depends on the number of columns. This could be the reason
rep(0.5, ncol(df))[df < 0.5]
#[1] 0.5 0.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#[41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Note that for the first 2 values, the 0.5 is correctly being changed as the logical matrix is TRUE for those elements df < 0.5 while it is not the case because 0.5 is replicated only based on the number of columns.
Suppose, we look at pmax, the line
mmm[change] <- each[change]
is problematic. We can check the output by printing the output of 'each' and 'each[change]. If we modify the function to include the print statement
pmax2 <- function (..., na.rm = FALSE)
{
elts <- list(...)
if (length(elts) == 0L)
stop("no arguments")
if (all(vapply(elts, function(x) is.atomic(x) && !is.object(x),
NA))) {
mmm <- .Internal(pmax(na.rm, ...))
mostattributes(mmm) <- attributes(elts[[1L]])
}
else {
mmm <- elts[[1L]]
has.na <- FALSE
as <- methods::as
asL <- function(x) if (isS4(x))
as(x, "logical")
else x
for (each in elts[-1L]) {
l1 <- length(each)
l2 <- length(mmm)
if (l2 && (l2 < l1 || !l1)) {
if (l1%%l2)
warning("an argument will be fractionally recycled")
mmm <- rep(mmm, length.out = l1)
}
else if (l1 && (l1 < l2 || !l2)) {
if (l2%%l1)
warning("an argument will be fractionally recycled")
each <- rep(each, length.out = l2)
}
na.m <- is.na(mmm)
na.e <- is.na(each)
if (has.na || (has.na <- any(na.m) || any(na.e))) {
if (any(na.m <- asL(na.m)))
mmm[na.m] <- each[na.m]
if (any(na.e <- asL(na.e)))
each[na.e] <- mmm[na.e]
}
nS4 <- !isS4(mmm)
if (isS4(change <- mmm < each) && (nS4 || !isS4(each)))
change <- as(change, "logical")
change <- change & !is.na(change)
print(change)
mmm[change] <- each[change]
print(each)
print(each[change])
if (has.na && !na.rm)
mmm[na.m | na.e] <- NA
if (nS4)
mostattributes(mmm) <- attributes(elts[[1L]])
}
}
mmm
}
Now, we check the print output based on applying pmax2 on 'df'
invisible(pmax2(df, 0.5))
# V1 V2 V3 V4 V5
# [1,] TRUE TRUE TRUE TRUE FALSE
# [2,] TRUE FALSE TRUE TRUE TRUE
# [3,] FALSE FALSE TRUE TRUE FALSE
# [4,] FALSE TRUE TRUE TRUE TRUE
# [5,] FALSE TRUE TRUE FALSE TRUE
# [6,] FALSE FALSE TRUE TRUE TRUE
# [7,] TRUE TRUE TRUE FALSE TRUE
# [8,] FALSE FALSE TRUE FALSE FALSE
# [9,] FALSE FALSE TRUE FALSE TRUE
#[10,] TRUE TRUE TRUE TRUE FALSE
#[11,] FALSE TRUE TRUE TRUE TRUE
#[12,] TRUE TRUE FALSE TRUE FALSE
#[13,] FALSE TRUE TRUE TRUE FALSE
#[14,] FALSE TRUE FALSE FALSE TRUE
#[15,] TRUE FALSE FALSE FALSE TRUE
#[16,] FALSE TRUE FALSE TRUE FALSE
#[17,] TRUE FALSE TRUE FALSE FALSE
#[18,] TRUE FALSE TRUE FALSE TRUE
#[19,] FALSE FALSE TRUE TRUE TRUE
#[20,] TRUE FALSE TRUE FALSE TRUE
#[1] 0.5 0.5 0.5 0.5 0.5
# [1] 0.5 0.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#[41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Note that this is exactly the same output we got with the rep mentioned earlier.
However, on a matrix this is not executed because of the if/else statements
invisible(pmax2(mat, 0.5))
nothing is printed
It is better to apply pmax on a matrix when compared with a single element than on a data.frame and that element. Otherwise, we can unlist the data.frame or convert it to matrix
all.equal(c(pmax(mat, .5)), pmax(unlist(df), .5), check.attributes = FALSE)
#[1] TRUE
data
set.seed(24)
mat <- matrix(runif(500), nrow = 20, ncol = 5)
df <- as.data.frame(mat)
Related
I cannot understand the properties of logical (boolean) values TRUE, FALSE and NA when used with logical OR (|) and logical AND (&). Here are some examples:
NA | TRUE
# [1] TRUE
NA | FALSE
# [1] NA
NA & TRUE
# [1] NA
NA & FALSE
# [1] FALSE
Can you explain these outputs?
To quote from ?Logic:
NA is a valid logical object. Where a component of x or y is NA, the
result will be NA if the outcome is ambiguous. In other words NA &
TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the
examples below.
The key there is the word "ambiguous". NA represents something that is "unknown". So NA & TRUE could be either true or false, but we don't know. Whereas NA & FALSE will be false no matter what the missing value is.
It's explained in help("|"):
NA is a valid logical object. Where a component of x or y
is NA, the result will be NA if the outcome is ambiguous. In
other words NA & TRUE evaluates to NA, but NA & FALSE
evaluates to FALSE. See the examples below.
From the examples in help("|"):
x <- c(NA, FALSE, TRUE)
names(x) <- as.character(x)
outer(x, x, "&") ## AND table
# <NA> FALSE TRUE
# <NA> NA FALSE NA
# FALSE FALSE FALSE FALSE
# TRUE NA FALSE TRUE
outer(x, x, "|") ## OR table
# <NA> FALSE TRUE
# <NA> NA NA TRUE
# FALSE NA FALSE TRUE
# TRUE TRUE TRUE TRUE
When using ls() in a function, it lists arguments of the function even if they've not been evaluated yet (even if they are missing from the call with no default value).
fun <- function(x,y,z,m){
a <- 1
y <- 1
force(z)
print(ls())
mget(ls())
}
fun(i,j,42)
# [1] "a" "m" "x" "y" "z"
Error in mget(ls()) : object 'i' not found
How can I list only evaluated variables ?
In that case I would be happy with a modified list giving either of :
# [1] "a" "y" "z"
# [1] "a" "y"
Alternatively (or additionally), a logical list telling me if arguments have been evaluated (or overwritten) would be great : in that case list(x = FALSE, y = TRUE, z = TRUE, m = FALSE)
Well, this is kind of close, there is a is_promise function in pryr. It expects a symbol but the unexported version is_promise2 can take a name. So something like this maybe
fun <- function(x,y,z,m){
a <- 1
y <- 1
force(z)
mget(ls()[!sapply(ls(), pryr:::is_promise2, environment())])
}
fun(i, j, 42)
which at least gets rid of the message about i. But doesn't seem to capture x. But just like is_promise2 does, I think you're going to have to dip into c/c++ land to find out information about evaluation/promise status because I think R tries to hide most of that from the user.
MrFlick's answer is what I was looking for, additional relevant information can be gathered using the function below, which is wrapped around trace for ease of use.
Better sample data
defined_in_global <- 1
enclosing_fun <- function(){
defined_in_enclos <- quote(qux)
function(not_evaluated,
overridden = "bar",
forced = "baz",
defined_in_global,
defined_in_enclos,
missing_with_default = 1,
missing_overriden,
missing_absent){
overridden <- TRUE
missing_overridden <- "a"
new_var <- 1
}
}
How to use, without trying to evaluate
fun <- enclosing_fun()
diagnose_vars(fun)
fun(not_evaluated = foo)
#> Tracing fun(not_evaluated = foo) on exit
#> name evaluable type is_formal missing absent_from_call is_promise has_default_value default_value called_with_value exists_in_parent exists_in_enclos
#> 1 not_evaluated FALSE <NA> TRUE FALSE FALSE TRUE FALSE NA foo FALSE FALSE
#> 2 overridden TRUE logical TRUE FALSE TRUE FALSE TRUE "bar" <NA> FALSE FALSE
#> 3 forced FALSE <NA> TRUE TRUE TRUE TRUE TRUE "baz" <NA> FALSE FALSE
#> 4 defined_in_global FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> TRUE TRUE
#> 5 defined_in_enclos FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE TRUE
#> 6 missing_with_default FALSE <NA> TRUE TRUE TRUE TRUE TRUE 1 <NA> FALSE FALSE
#> 7 missing_overriden FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 8 missing_absent FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 9 missing_overridden TRUE character FALSE NA NA NA NA NA <NA> FALSE FALSE
#> 10 new_var TRUE double FALSE NA NA NA NA NA <NA> FALSE FALSE
How to use, trying to evaluate
diagnose_vars(fun, eval = TRUE)
fun(not_evaluated = foo)
#> Tracing fun(not_evaluated = foo) on exit
#> name evaluable type is_formal missing absent_from_call is_promise has_default_value default_value called_with_value exists_in_parent exists_in_enclos
#> 1 not_evaluated TRUE <NA> TRUE FALSE FALSE TRUE FALSE NA foo FALSE FALSE
#> 2 overridden FALSE logical TRUE FALSE TRUE FALSE TRUE "bar" <NA> FALSE FALSE
#> 3 forced FALSE character TRUE TRUE TRUE TRUE TRUE "baz" <NA> FALSE FALSE
#> 4 defined_in_global TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> TRUE TRUE
#> 5 defined_in_enclos TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE TRUE
#> 6 missing_with_default FALSE double TRUE TRUE TRUE TRUE TRUE 1 <NA> FALSE FALSE
#> 7 missing_overriden TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 8 missing_absent TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 9 missing_overridden FALSE character FALSE NA NA NA NA NA <NA> FALSE FALSE
#> 10 new_var FALSE double FALSE NA NA NA NA NA <NA> FALSE FALSE
The code
diagnose_vars <- function(f, eval = FALSE, on.exit = TRUE, ...) {
eval(substitute(
if(on.exit) trace(..., what =f, exit = quote({
diagnose_vars0(eval, print = TRUE)
untrace(f)}))
else trace(..., what =f, tracer = diagnose_vars0(eval, print = TRUE),
exit = substitute(untrace(f)), ...)
))
invisible(NULL)
}
diagnose_vars0 <- function(eval = FALSE, print = FALSE){
f_env <- parent.frame()
mc <- eval(quote(match.call()), f_env)
f <- eval.parent(mc[[1]],2)
f_parent_env <- parent.frame(2)
f_enclos <- rlang::fn_env(f)
vars <- ls(f_env)
fmls <- eval(quote(formals()), f_env)
fml_nms <- names(fmls)
fml_syms <- rlang::syms(fml_nms)
mc_args <- as.list(mc)[-1]
# compute complete df cols when possible
is_formal <- vars %in% fml_nms
# build raw df, with NA cols when necessary to initiate
data <- data.frame(row.names = vars,
name = vars,
evaluable = NA,
type = NA,
is_formal,
missing = NA,
absent_from_call = NA,
is_promise = NA,
has_default_value = NA)
# absent_from_call : different from missing when variable is overriden
data[fml_nms, "absent_from_call"] <- ! fml_nms %in% names(mc_args)
# promise
data[fml_nms, "is_promise"] <- sapply(fml_nms, pryr:::is_promise2, f_env)
# missing
data[fml_nms, "missing"] <- sapply(fml_syms, function(x)
eval(substitute(missing(VAR), list(VAR = x)), f_env))
# has default values
formal_has_default_value <- !sapply(fmls,identical, alist(x=)[[1]])
data[fml_nms, "has_default_value"] <- formal_has_default_value
# default values
data$default_value <-
vector("list",length(vars))
data$default_value[] <- NA
data[fml_nms[formal_has_default_value], "default_value"] <-
sapply(fmls[formal_has_default_value], deparse)
# called_with_value
data[names(mc_args), "called_with_value"] <-
sapply(mc_args, deparse)
# exists
data$exists_in_parent <- sapply(vars, exists, envir= f_parent_env)
data$exists_in_enclos <- sapply(vars, exists, envir= f_enclos)
# types
if(eval){
types <- sapply(vars, function(x)
try(eval(bquote(typeof(.(as.symbol(x)))), f_env),silent = TRUE))
data$type <- ifelse(startsWith(types,"Error"), NA, types)
data$evaluable <- is.na(data$type)
} else {
data$evaluable <-
with(data,!is_formal | (!is_promise & !missing))
data$type[data$evaluable] <-
sapply(mget(vars[data$evaluable], f_env), typeof)
}
# arrange
data <- rbind(data[fml_nms,],data[!data$name %in% fml_nms,])
row.names(data) <- NULL
if (print) print(data) else data
}
I have a table look like this one
C1 C2 C3 C4 C5....
R1 FALSE FALSE TRUE TRUE
R2 FALSE FALSE NA TRUE
R3 NA NA NA TRUE
R4 NA FALSE FALSE FALSE
R5 NA NA NA NA
.
.
.
I want to keep all rows which contain at least one TRUE. In this table, R1, R2 and R3 need to be kept. Then, I can extract another column(C21)'s value from this same table.
Please give me some advise, thank you!
# Example
x <-
matrix(c(FALSE, FALSE, NA, NA, NA, FALSE, FALSE, NA, FALSE, NA, TRUE, NA, NA, FALSE, TRUE, TRUE, FALSE, NA),
nrow = 5, ncol = 4, dimnames = list(paste0("R", 1:5), paste0("C", 1:4)))
x
# C1 C2 C3 C4
# R1 FALSE FALSE TRUE TRUE
# R2 FALSE FALSE NA FALSE
# R3 NA NA NA NA
# R4 NA FALSE FALSE FALSE
# R5 NA NA TRUE FALSE
# apply the 'any()' function to the rows, this will return true if there is at
# least one TRUE in the row
apply(x, 1, any)
# R1 R2 R3 R4 R5
# TRUE NA NA NA TRUE
# use 'which' to get the row index
which(apply(x, 1, any))
# R1 R5
# 1 5
# subset the matrix
idx <- which(apply(x, 1, any))
x[idx, ]
# C1 C2 C3 C4
# R1 FALSE FALSE TRUE TRUE
# R5 NA NA TRUE FALSE
apply(X = df1, 1, any)
will give you a logical vector, that you can then use accordingly
i.e. df1[which(apply(df1, 1, any)), ]
We can use rowSums on the logical matrix (df1 & !is.na(df1)), check if the sum is greater than 0, use that logical vector to subset the rows.
Subdf <- df1[rowSums(df1 & !is.na(df1)) >0,]
Subdf
# C1 C2 C3 C4
#R1 FALSE FALSE TRUE TRUE
#R2 FALSE FALSE NA TRUE
#R3 NA NA NA TRUE
Or we can use the na.rm=TRUE in rowSums
df1[rowSums(df1, na.rm=TRUE) > 0,]
We can extract the 'C21' column by Subdf$C21 or Subdf[['C21']] (if the initial dataset is data.frame) or Subdf[, 'C21'] for matrix (in the example, I didn't have 21 columns)
I cannot understand the properties of logical (boolean) values TRUE, FALSE and NA when used with logical OR (|) and logical AND (&). Here are some examples:
NA | TRUE
# [1] TRUE
NA | FALSE
# [1] NA
NA & TRUE
# [1] NA
NA & FALSE
# [1] FALSE
Can you explain these outputs?
To quote from ?Logic:
NA is a valid logical object. Where a component of x or y is NA, the
result will be NA if the outcome is ambiguous. In other words NA &
TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the
examples below.
The key there is the word "ambiguous". NA represents something that is "unknown". So NA & TRUE could be either true or false, but we don't know. Whereas NA & FALSE will be false no matter what the missing value is.
It's explained in help("|"):
NA is a valid logical object. Where a component of x or y
is NA, the result will be NA if the outcome is ambiguous. In
other words NA & TRUE evaluates to NA, but NA & FALSE
evaluates to FALSE. See the examples below.
From the examples in help("|"):
x <- c(NA, FALSE, TRUE)
names(x) <- as.character(x)
outer(x, x, "&") ## AND table
# <NA> FALSE TRUE
# <NA> NA FALSE NA
# FALSE FALSE FALSE FALSE
# TRUE NA FALSE TRUE
outer(x, x, "|") ## OR table
# <NA> FALSE TRUE
# <NA> NA NA TRUE
# FALSE NA FALSE TRUE
# TRUE TRUE TRUE TRUE
If we have his data recentely used here:
data <- data.frame(name = rep(letters[1:3], each = 3),
var1 = rep(1:9), var2 = rep(3:5, each = 3))
name var1 var2
1 a 1 3
2 a 2 3
3 a 3 3
4 b 4 4
5 b 5 4
6 b 6 4
7 c 7 5
8 c 8 5
9 c 9 5
we can look for rows where var2 == 4.
data[data[,3] == 4 ,] # equally data[data$var2 == 4 ,]
# name var1 var2
#4 b 4 4
#5 b 5 4
#6 b 6 4
or rows where both var1 and var2 ==4
data[data[,2] == 4 & data[,3] == 4,]
# name var1 var2
#4 b 4 4
what I dont get is why this:
data[ data[ , 2:3 ] == 4 ,]
gives this:
name var1 var2
4 b 4 4
NA <NA> NA NA
NA.1 <NA> NA NA
NA.2 <NA> NA NA
#I would still hope to get
# name var1 var2
#4 b 4 4
Where do the NAs come from?
Your logical that you're subsetting on is a matrix:
> sel <- data[ , 2:3 ] == 4
> sel
var1 var2
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] TRUE TRUE
[5,] FALSE TRUE
[6,] FALSE TRUE
[7,] FALSE FALSE
[8,] FALSE FALSE
[9,] FALSE FALSE
According to help("[.data.frame"):
Matrix indexing (x[i] with a logical or a 2-column integer matrix i)
using [ is not recommended, and barely supported. For extraction, x is
first coerced to a matrix. For replacement, a logical matrix (only)
can be used to select the elements to be replaced in the same way as
for a matrix.
But that implies this form:
> data[ sel ]
[1] "b" "4" "5" "6" "4"
Badness. What you're doing is even less sensical, though, in that you're telling it you want only the rows (with your trailing comma), and then giving it a matrix to index on!
> data[sel,]
name var1 var2
4 b 4 4
NA <NA> NA NA
NA.1 <NA> NA NA
NA.2 <NA> NA NA
If you really wanted to use the matrix form, you could use apply to apply a logical operation across rows.
Your data[,2:3]==4 is the following :
R> data[,2:3]==4
var1 var2
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] TRUE TRUE
[5,] FALSE TRUE
[6,] FALSE TRUE
[7,] FALSE FALSE
[8,] FALSE FALSE
[9,] FALSE FALSE
Then you try to index the rows of your data frame with this matrix. To do this, R seems to first convert your matrix to a vector :
R> as.vector(data[,2:3]==4)
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE TRUE TRUE TRUE FALSE FALSE FALSE
It then selects the rows of data based on this vector. The 4th TRUE value selects the 4th row, but the three others TRUE values select "out of bounds" rows, so they return NA's.
data[ data[ , 2 ] == 4 | data[,3] == 4,]
name var1 var2
4 b 4 4
5 b 5 4
6 b 6 4
I suspect your method does not work because c() builds a vector, whereas you need to compare the atomic elements.
Because you're not passing a vector but a matrix to the index:
> data[ , 2:3 ] == 4
var1 var2
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] TRUE TRUE
[5,] FALSE TRUE
[6,] FALSE TRUE
[7,] FALSE FALSE
[8,] FALSE FALSE
[9,] FALSE FALSE
If you want the matrix collapsed into a vector that indexing works with here are two options:
data[ apply(data[ , 2:3 ] == 4, 1, all) ,]
data[ rowSums(data[ , 2:3 ] == 4) == 2 ,]