Behavior of identical() in apply in R - r

This is weird.
apply( matrix(c(1,NA,2,3,NA,NA,2,4),ncol = 2), 1, function(x) identical(x[1], x[2]) )
#[1] FALSE TRUE TRUE FALSE
apply( data.frame(a = c(1,NA,2,3),b = c(NA,NA,2,4)), 1, function(x) identical(x[1], x[2]) )
#[1] FALSE FALSE FALSE FALSE
apply( as.matrix(data.frame(a = c(1,NA,2,3),b = c(NA,NA,2,4))), 1, function(x) identical(x[1], x[2]) )
#[1] FALSE FALSE FALSE FALSE
This is due to the names attribute as indicated below by joran. I can obtain the result I expected by:
apply( data.frame(a = c(1,NA,2,3),b = c(NA,NA,2,4)), 1, function(x) identical(unname(x[1]), unname(x[2])) )
or:
apply( data.frame(a = c(1,NA,2,3),b = c(NA,NA,2,4)), 1, function(x) identical(x[[1]], x[[2]]) )
Is there a more natural way to approach this? It would seem that there should be an option to ignore attributes, like in all.equal().

Probably
mapply(identical, x$a, x$b)
#[1] FALSE TRUE TRUE FALSE
where x is a data frame.
As an aside, using apply with a data frame is almost always a mistake. It will coerce the data frame to a matrix which often leads to unexpected results.

Related

R: how to check if a vector is found in another vector of different length without using %in%

vector_1 = c(4,3,5,1,2)
vector_2 = c(3,1)
output:
[1] FALSE TRUE FALSE TRUE FALSE
how do I get the output just by using basic operators/loops without using the operator %in% or any functions in R?
See match.fun(`%in%`)
match(vector_1,vector_2, nomatch = 0) > 0
Without "functions" is a bit vague, since virtually anything in R is a function. Probably that's an assignment and a for loop is wanted.
res <- logical(length(vector_1))
for (i in seq_along(vector_1)) {
for (j in seq_along(vector_2)) {
if (vector_1[i] == vector_2[j])
res[i] <- TRUE
}
}
res
# [1] FALSE TRUE FALSE TRUE FALSE
However, that's not very R-ish where you rather want to do something like
apply(outer(vector_1, vector_2, `==`), 1, \(x) x[which.max(x)])
# [1] FALSE TRUE FALSE TRUE FALSE
Data:
vector_1 <- c(4, 3, 5, 1, 2)
vector_2 <- c(3, 1)
One way with sapply() -
sapply(vector_1, function(x) any(x == vector_2))
[1] FALSE TRUE FALSE TRUE FALSE

how to filter TRUE values from logical results of sapply?

I am new in R and to check whether a column has negative values or not I have used sapply()
sapply(dt1, FUN = function(x) all(x <= 0, na.rm = TRUE))
########## output ##########
SK_ID_CURR TARGET NAME_CONTRACT_TYPE
FALSE FALSE FALSE
CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY
FALSE FALSE FALSE
CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT
FALSE FALSE FALSE
AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE
FALSE FALSE FALSE
NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS
FALSE FALSE FALSE
NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE DAYS_BIRTH
FALSE FALSE TRUE
DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH
TRUE TRUE TRUE
OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE
FALSE FALSE FALSE
FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_PHONE
FALSE FALSE FALSE
FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS
FALSE FALSE FALSE
I want to filter only the TRUE one's from these & have tried:
dt1[,sapply(dt1, FUN = function(x) all(x <= 0, na.rm = TRUE)) == TRUE]
but this didn't give what I expected and got the same results as earlier ones.
I even tried below code but that also didn't help:
dt1[,lapply(dt1, FUN = function(x) all(x <= 0, na.rm = TRUE)) == TRUE]
You don't need == TRUE. With dataframe this should have worked.
dt1[, sapply(dt1, FUN = function(x) all(x <= 0, na.rm = TRUE))]
But I think you have data.table, try to add with = FALSE.
library(data.table)
dt1[, sapply(dt1, FUN = function(x) all(x <= 0, na.rm = TRUE)), with = FALSE]
With data.table, we can use
library(data.table)
dt1[, dt1[, !colSums(.SD > 0)], with = FALSE]
Or using select
library(dplyr)
dt1 %>%
select(where(~all(. <= 0))

Is the argument included in the page list?

I want to define the operator %in%, whose operation is to return TRUE if the argument on the left is in the list on the right and FALSE otherwise. The task should be implemented without using a loop.
### Creating a simple list
ell <- list( 2, c( 2, 5), list( c( 2, 8)), "xyz")
### Testing of selected elements
2 %in% ell
5 %in% ell
list( c( 2, 8)) %in% ell
list( list( 2, 8)) %in% ell
"xyz" %in% ell
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
[1] TRUE
Like MrFlick said in a comment, do not override built-in operators, it will definitely break something.
Try this one and see if it does what you want. I have named the new operator %IN%, since R is case sensitive.
`%IN%` <- function(x, y){
x %in% unlist(y, recursive = FALSE)
}
2 %IN% ell
#[1] TRUE
5 %IN% ell
#[1] TRUE
list( c( 2, 8)) %IN% ell
#[1] TRUE
list( list( 2, 8)) %IN% ell
#[1] FALSE
"xyz" %IN% ell
#[1] TRUE

Applying operators stored in a vector as text

I would like to apply operators stored in a vector operatorsUsed to series1 and series2 of the data frame df:
operatorsUsed = c('==', '>=', '<=')
series1 = 1:5
series2 = c(1, 3, 2, 4, 5)
df = data.frame(series1,
series2,
stringsAsFactors = FALSE)
I tried combining the parse() and eval() function:
nbrOperators = length(operatorsUsed)
for (j in 1:nbrOperators){
a = df[eval(parse(text = paste0(df$series1, operatorsUsed[j], df$series2))),]
tableCreated = paste0('b', j)
assign(tableCreated, a)
}
But this doesn't work. With parse, I obtain for e.g. j=1
expression(1==1, 2==3, 3==2, 4==4, 5==5)
Which looks promising but then applying eval yields
[1] TRUE
Rather than the looked for
[1] TRUE FALSE FALSE TRUE TRUE
Is there away I can apply operators stored in a vector as text?
We can use lapply with get
lapply(operatorsUsed, function(op) get(op)(df$series1, df$series2))
#[[1]]
#[1] TRUE FALSE FALSE TRUE TRUE
#[[2]]
#[1] TRUE FALSE TRUE TRUE TRUE
#[[3]]
#[1] TRUE TRUE FALSE TRUE TRUE
as #rawr mentioned in the comments, we can also use match.fun(op) instead of get(op) in the lapply

Search a matrix for rows with given values in any order

I have a matrix and a vector with values:
mat<-matrix(c(1,1,6,
3,5,2,
1,6,5,
2,2,7,
8,6,1),nrow=5,ncol=3,byrow=T)
vec<-c(1,6)
This is a small subset of a N by N matrix and 1 by N vector. Is there a way so that I can subset the rows with values in vec?
The most straight forward way of doing this that I know of would be to use the subset function:
subset(mat,vec[,1] == 1 & vec[,2] == 6) #etc etc
The problem with subset is you have to specify in advance the column to look for and the specific combination to do for. The problem I am facing is structured in a way such that I want to find all rows containing the numbers in "vec" in any possible way. So in the above example, I want to get a return matrix of:
1,1,6
1,6,5
8,6,1
Any ideas?
You can do
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
but this may give you unexpected results if vec contains repeated values:
vec <- c(1, 1)
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
so you would have to use something more complicated using table to account for repetitions:
vec <- c(1, 1)
is.sub.table <- function(table1, table2) {
all(names(table1) %in% names(table2)) &&
all(table1 <= table2[names(table1)])
}
apply(mat, 1, function(x)is.sub.table(table(vec), table(x)))
# [1] TRUE FALSE FALSE FALSE FALSE
However, if the vector length is equal to the number of columns in your matrix as you seem to indicate but is not the case in your example, you should just do:
vec <- c(1, 6, 1)
apply(mat, 1, function(x) all(sort(vec) == sort(x)))
# [1] TRUE FALSE FALSE FALSE FALSE

Resources