How to check a data.frame for any non-finite - r
I'd like to check if a data.frame has any non-finite elements.
This seems to evaluate each column, returning FALSE for each (I'm guessing its evaluating the data.frame as a list):
any( !is.finite( x ) )
I don't understand why this behaves differently from the above, but it works fine if just checking for NAs:
any( !is.na( x ) )
I'd like the solution to be as efficient as possible. I realize I can just do...
any( !is.finite( as.matrix( x ) ) )
If you type methods(is.na) you'll see that it has a data.frame method, which probably explains why it works the way you expect, where is.finite does not. The usual solution would be to write one yourself, since it's only one line. Something like this maybe,
is.finite.data.frame <- function(obj){
sapply(obj,FUN = function(x) all(is.finite(x)))
}
I'm assuming the error you are getting is the following:
> any( is.infinite( z ) )
Error in is.infinite(z) : default method not implemented for type 'list'
This error is because the is.infinite() and the is.finite() functions are not implemented with a method for data.frames. The is.na() function does have a data.frame method.
The way to work around this is to apply() the function to every row, column, or element in the data.frame. Here's an example using sapply() to apply the is.infinite() function to each element:
x <- c(1:10, NA)
y <- c(1:11)
z <- data.frame(x,y)
any( sapply(z, is.infinite) )
## or
any( ! sapply(z, is.finite) )
Your solution of calling as.matrix will only work if the data.frame only has numeric columns. Otherwise, the matrix will typically become a character matrix and the result will be false everywhere...
#joran has a good approach, but you'll have problems with factor columns unless to add a method for factors too etc...
is.finite(letters[1:3]) # FALSE - OK
is.finite(factor(letters[1:3])) # TRUE - WRONG!!
is.finite.factor <- function(obj){
logical(length(obj))
}
is.finite(factor(letters[1:3])) # FALSE - OK
Also, if you want the check to be as fast as possible, you should avoid sapply and go for vapply instead.
d <- data.frame(matrix(runif(1e6), nrow=10), letters[1:10])
# #joran's method
is.finite.data.frame <- function(obj){
sapply(obj,FUN = function(x) all(is.finite(x)))
}
system.time( x <- is.finite(d) ) # 0.42 secs
# Using vapply instead...
is.finite.data.frame <- function(obj) {
vapply(obj,FUN = function(x) all(is.finite(x)), logical(1))
}
system.time( y <- is.finite(d) ) # 0.20 secs
identical(x,y) # TRUE
One difference is that is.na and is.finite are different types of functions. is.na is a generic and will dispatch based on the class of the argument.
> methods("is.na")
[1] is.na.data.frame is.na.numeric_version is.na.POSIXlt
[4] is.na.raster*
Non-visible functions are asterisked
Note in particular that there is an is.na.data.frame function. Looking at that function:
> is.na.data.frame
function (x)
{
y <- do.call("cbind", lapply(x, "is.na"))
if (.row_names_info(x) > 0L)
rownames(y) <- row.names(x)
y
}
<bytecode: 00000000054F40F0>
<environment: namespace:base>
the part that does the work is the do.call("cbind", lapply(x, "is.na")) call which puts columns together (cbind) which are the result of lapply(x, "is.na"). Running just this with an example data.frame (mtcars):
> lapply(mtcars, "is.na")
$mpg
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$cyl
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$disp
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$hp
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$drat
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$wt
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$qsec
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$vs
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$am
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$gear
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$carb
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
we see that this is really just a column-wise computation, put back together into a data.frame.
Compare that to is.finite which does not have a specific function for data.frames:
> methods("is.finite")
no methods were found
In fact, it is a primitive method, meaning that the details are in C code, not R code.
> is.finite
function (x) .Primitive("is.finite")
If you want to do a column-wise computation with is.finite, you can wrap it like is.na.data.frame does.
> do.call(cbind, lapply(mtcars, is.finite))
mpg cyl disp hp drat wt qsec vs am gear carb
[1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[11,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[12,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[14,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[15,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[17,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[18,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[19,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[20,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[21,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[22,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[23,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[24,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[26,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[27,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[28,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[29,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[30,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[32,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
This latter could also be gotten as
sapply(mtcars, is.finite)
No testing on what would be most efficient, though.
Related
subsetting by index in R
I have an vector with indexes: indexes [1] 25 2 16 23 and another vector with logical: logical [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [19] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE i want to keep all logical items that, except those with indexes stored in indexes. i thought this would have an easy solution, but mine doesn't work: for(index in indexes){ logical[index] = NULL }
You could just use minus (-) indexing : indexes <- c(25, 2, 16, 23) logicals <- sample(c(T,F),25,replace=T) logicals #> [1] FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [13] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE #> [25] FALSE logicals[-indexes] #> [1] FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
Logical vector to see wether an element of a df is contained within a df inside a List
I tried: mdf$CLAVE.EMISORA %in% BMV[[9]]$`CLAVE EMISORA` But it only returns: logical(0) For some reason the reveres seems to work: BMV[[9]]$`CLAVE EMISORA` %in% mdf$CLAVE.EMISORA [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [20] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [39] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [58] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [77] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE My data (mdf): I have it but I don't know how to embed My list (BMV): .... I don't know how to copy a list to clipboard sorry...
logical(0) is a vector of base type logical with 0 length. You're getting this because your trying to check if any element in a vector of length 0 is present in BMV[[9]]$'CLAVE EMISORA' if you run length(mdf$CLAVE.EMISORA) You'll get 0 as output Reverse works because you're checking if any element from a vector of a non-zero length is present in a vector of 0 length.
Response from generalized linear model is opposite what is expected?
I'm attempting to use n-fold cross-validation to estimate the mcr of a logistic regression classifier, however, the results I am getting are opposite what I'm expecting, and I'm not sure why? Here is my complete R code: library(ALL); data(ALL); library(caret) IsB <- ALL$BT levels(IsB) <- c(rep(TRUE, 5), rep(FALSE, 5)) ALL.names <- ALL[c('39317_at', '38018_g_at'),] expr.data <- t(exprs(ALL.names)) data.lgr <- data.frame(IsB, expr.data) n <- dim(data.lgr)[1] index <- 1:n K <- n flds <- createFolds(index, k = K) mcr.cv.raw <- rep(NA, K) for (i in 1:K) { testID <- flds[[i]] data.tr <- data.lgr[-testID,] data.test <- data.lgr[testID,] reg.lgr <- glm(IsB ~ ., data = data.tr, family = binomial(link = 'logit')) pred.prob <- predict(reg.lgr, newdata = data.test, type="response") pred.B <- (pred.prob > 0.5) mcr.cv.raw[i] <- sum(pred.B != data.test$IsB) / length(pred.B) } mcr.cv <- mean(mcr.cv.raw) mcr.cv Running this code will output 0.90625, however, this is almost the exact opposite of what I would expect. I think that the problem is coming from the values of pred.prob generated in the for loop. Logically, I am assuming that the probabilities produced are the probability that the sample in data.test would be classified as true, but when looking at all of the values generated for pred.b vs all of the value in IsB, you can see that they are all opposite what one would expect: pred.b: [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [11] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [21] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [31] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [41] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE [51] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [61] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE [71] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [91] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE [101] TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE [111] FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE [121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE IsB: [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [11] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [21] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [41] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [51] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [71] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [81] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [91] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE [101] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Any help with where my logic or code is failing is appreciated!
R Programming- Permutation regarding repetition and order
I have attempted to search and attempt solutions to no avail with the combn and gtools library. I want to take a vector of the following: x<-c(TRUE,FALSE) and have it look like the following output: Permutations with repetition (n=2, r=5) Using Items: t,f List has 32 entries. {t,t,t,t,t} {t,t,t,t,f} {t,t,t,f,t} {t,t,t,f,f} {t,t,f,t,t} {t,t,f,t,f} {t,t,f,f,t} {t,t,f,f,f} {t,f,t,t,t} {t,f,t,t,f} {t,f,t,f,t} {t,f,t,f,f} {t,f,f,t,t} {t,f,f,t,f} {t,f,f,f,t} {t,f,f,f,f} {f,t,t,t,t} {f,t,t,t,f} {f,t,t,f,t} {f,t,t,f,f} {f,t,f,t,t} {f,t,f,t,f} {f,t,f,f,t} {f,t,f,f,f} {f,f,t,t,t} {f,f,t,t,f} {f,f,t,f,t} {f,f,t,f,f} {f,f,f,t,t} {f,f,f,t,f} {f,f,f,f,t} {f,f,f,f,f} Any suggestions? I am quite a newbie at this, so any help is appreciated. I used the following online calculator to give me the solution below. https://www.mathsisfun.com/combinatorics/combinations-permutations-calculator.html Thanks!
Using the gtools library, I believe this is: library(gtools) permutations(2,5,v=c(TRUE,FALSE),repeats.allowed=TRUE) ## [,1] [,2] [,3] [,4] [,5] ## [1,] FALSE FALSE FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE FALSE TRUE ## [3,] FALSE FALSE FALSE TRUE FALSE ## [4,] FALSE FALSE FALSE TRUE TRUE ## [5,] FALSE FALSE TRUE FALSE FALSE ## [6,] FALSE FALSE TRUE FALSE TRUE ## [7,] FALSE FALSE TRUE TRUE FALSE ## [8,] FALSE FALSE TRUE TRUE TRUE ## [9,] FALSE TRUE FALSE FALSE FALSE ##[10,] FALSE TRUE FALSE FALSE TRUE ##[11,] FALSE TRUE FALSE TRUE FALSE ##[12,] FALSE TRUE FALSE TRUE TRUE ##[13,] FALSE TRUE TRUE FALSE FALSE ##[14,] FALSE TRUE TRUE FALSE TRUE ##[15,] FALSE TRUE TRUE TRUE FALSE ##[16,] FALSE TRUE TRUE TRUE TRUE ##[17,] TRUE FALSE FALSE FALSE FALSE ##[18,] TRUE FALSE FALSE FALSE TRUE ##[19,] TRUE FALSE FALSE TRUE FALSE ##[20,] TRUE FALSE FALSE TRUE TRUE ##[21,] TRUE FALSE TRUE FALSE FALSE ##[22,] TRUE FALSE TRUE FALSE TRUE ##[23,] TRUE FALSE TRUE TRUE FALSE ##[24,] TRUE FALSE TRUE TRUE TRUE ##[25,] TRUE TRUE FALSE FALSE FALSE ##[26,] TRUE TRUE FALSE FALSE TRUE ##[27,] TRUE TRUE FALSE TRUE FALSE ##[28,] TRUE TRUE FALSE TRUE TRUE ##[29,] TRUE TRUE TRUE FALSE FALSE ##[30,] TRUE TRUE TRUE FALSE TRUE ##[31,] TRUE TRUE TRUE TRUE FALSE ##[32,] TRUE TRUE TRUE TRUE TRUE
Displaying which rows are duplicates in R? [duplicate]
This question already has answers here: Find indices of duplicated rows [duplicate] (2 answers) Closed 7 years ago. I have a dataset in a txt file that has thousands of lines, each row containing 6 entries: 27.952555 4.023447 61.275883 -0.305102 -0.869921 -1.222882 27.952555 4.617039 60.936607 -0.296737 -0.369152 -1.435724 Is there a way I can check if there are any identical rows in R, such as if I came across this line below? 27.952555 4.023447 61.275883 -0.305102 -0.869921 -1.222882 27.952555 4.617039 60.936607 -0.296737 -0.369152 -1.435724... 27.952555 4.023447 61.275883 -0.305102 -0.869921 -1.222882 How to display this duplicate? `
Use duplicated: duplicated(iris) # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE # [145] FALSE FALSE FALSE FALSE FALSE FALSE Possibly in tandem with which to see row numbers: which(duplicated(iris)) # [1] 143 Or [ extraction to see the rows themselves: iris[duplicated(iris),] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 143 5.8 2.7 5.1 1.9 virginica