I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows:
> RF <- randomForest(prePrior1, postPrior1[,6]
+ ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, :
NA/NaN/Inf in foreign function call (arg 1)
So I try to find any NA's using :
> df2 <- prePrior1[is.na(prePrior1)]
> df2
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2
numeric(0)
which leads me to believe that it's Inf's that are the problem as there don't seem to be any NA's.
Any suggestions for how to root out Inf's?
You're probably looking for is.finite, though I'm not 100% certain that the problem is Infs in your input data.
Be sure to read the help for is.finite carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:
> is.finite(c(1,NA,-Inf,NaN))
[1] TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE TRUE FALSE
One of these things is not like the others. Not surprisingly, there's an is.nan function as well.
randomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:
you will get this if any of the variables passed is character
actual NaNs and Infs almost never happen in clean data
My fast-and-dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like ntree=2 to get an instant pass/fail on the subset of variables:
RF <- randomForest(prePrior1[m:n],ntree=2,...)
In analogy to is.na, you can use is.infinite to find occurrences of infinites.
Take a look at with, e.g.:
> with(df, df == Inf)
foo bar baz abc ...
[1,] FALSE FALSE TRUE FALSE ...
[2,] FALSE TRUE FALSE FALSE ...
...
joran's answer is what you want and informative. For more details about is.na() and is.infinite(), you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html
and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use the which() function to get the indices, just like this:
> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
> which(is.na(v1))
[1] 4 7
the document for which() is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html
Related
I'm trying to generate a corrplot with numbers in R and the following error appears: "Error in is.finite(tmp) : default method not implemented for type 'list'"
I tried transforming my list into data frame, but for sure I'm doing something wrong.
Could anyone help me to solve it?
Thanks in advance
Note: I put a picture showing the "x"
Below my code
library(corrplot)
x<-read.table("corrplot_2.txt")
x <- as.data.frame(x)
corrplot(x, method="number")
The problem is that the input of corrplot should be a correlation matrix:
The correlation matrix to visualize, must be square if order is not
'original'. For general matrix, please using is.corr = FALSE to
convert.
So you first have to convert your dataframe to a correlation matrix using cor like this:
library(corrplot)
x<-read.table("corrplot_2.txt")
x <- as.data.frame(x)
M <- cor(x)
corrplot(M, method="number")
Problem
You receive this error because is.finite() cannot handle data frames / lists:
# example
df <- as.data.frame(matrix(runif(30), ncol=5))
is.finite(df)
> is.finite(df)
Error in is.finite(df) : default method not implemented for type 'list'
Solution
The functions lapply() or sapply() allow us to use is.finite() for every column:
# solution 1
lapply(df, is.finite)
sapply(df, is.finite)
> lapply(df, is.finite)
$V1
[1] TRUE TRUE TRUE TRUE TRUE TRUE
$V2
[1] TRUE TRUE TRUE TRUE TRUE TRUE
...
> sapply(df, is.finite)
V1 V2 V3 V4 V5
[1,] TRUE TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE TRUE
...
Since we probably don't want an answer for each value individually, but we want to know if all of them are finite, we add an all():
# solution 2
all(sapply(df, is.finite))
> all(sapply(df, is.finite))
[1] TRUE
If I understand, rlang::quo_is_missing evaluates a quosure and checks whether it contains a missing value. If it does, it should return TRUE, FALSE if not. Yet, I've tried the following combinations and it always returns FALSE:
rlang::quo_is_missing(quo(NA))
rlang::quo_is_missing(quo(NA_character_))
rlang::quo_is_missing(quo(NA_integer_))
If I try non-NA values, it also returns FALSE, as expected:
rlang::quo_is_missing(quo("hello"))
Why is it returning FALSE when the value is obviously missing?
"Missing" is a special term that refers to values that are not present at all. NA is not the same as "missing" -- NA is itself a value. In base R you can compare the functions is.na() and missing() each of which do different things. quo_is_missing is like the missing() function, not is.na and returns true only when there is no value at all:
rlang::quo_is_missing(quo())
If you want to check for NA, you could write a helper
quo_is_na <- function(x) {
!rlang::quo_is_symbolic(x) &&
!rlang::quo_is_missing(x) &&
!rlang::quo_is_null(x) &&
is.na(rlang::quo_get_expr(x))
}
quo_is_na(quo())
# [1] FALSE
quo_is_na(quo(x+y))
# [1] FALSE
quo_is_na(quo(NULL))
# [1] FALSE
quo_is_na(quo(42))
# [1] FALSE
quo_is_na(quo(NA))
# [1] TRUE
quo_is_na(quo(NA_character_))
# [1] TRUE
I have a dataframe object that is presorted, and I am trying to call a function that requires it to be sorted. Somehow is.unsorted() is returning true. R then proceeds to sort it.
Unfortunately, there are about 2million entries, and I don't have the memory. Is there a way to force is.unsorted to be false?
Quick check of the RDocumentation (is.unsorted) includes the following line:
Note:
This function is designed for objects with one-dimensional indices, as described above. Data frames, matrices and other arrays may give surprising results.
Therefore, you should avoid using this function on complete data frames. Instead, you should run this function on certain features of the data frame, instead of the entire data frame itself.
Take the below code snippet for example. You can see that this function works as expected on one-dimensional objects (vectors); however has a surprising result when run on a data frame (returned FALSE when expecting TRUE result).
However, when the data frame was subset (using the $ operator) and the is.unsorted() function is run on the individual features, then it returns the expected result.
> vec <- c(1,2,3,4,5)
> is.unsorted(vec) # Expected: FALSE
[1] FALSE
> vec <- c(1,3,2,5,4)
> is.unsorted(vec) # Expected: TRUE
[1] TRUE
> vec <- c("A","B","C","D","E")
> is.unsorted(vec) # Expected: FALSE
[1] FALSE
> vec <- c("A","C","B","E","D")
> is.unsorted(vec) # Expected: TRUE
[1] TRUE
> dat <- data.frame(num=c(1,2,3,4,5)
+ ,chr=c("A","B","C","D","E")
+ ,stringsAsFactors=FALSE
+ )
> is.unsorted(dat) # Expected: FALSE
[1] FALSE
> dat <- data.frame(num=c(1,3,2,5,4)
+ ,chr=c("A","B","C","D","E")
+ ,stringsAsFactors=FALSE
+ )
> is.unsorted(dat) # Expected: TRUE
[1] FALSE
> is.unsorted(dat$num) # Expected: TRUE
[1] TRUE
> is.unsorted(dat$chr) # Expected: FALSE
[1] FALSE
i have a dataframe named as newdata. it has two columns named as BONUS and GENDER.
When i write the following code in r:
> newdata <- within(newdata,{
PROMOTION=ifelse(BONUS>=1500,1,0)})
it works though i haven't used loop here but the following codes don't work without loop. Why?
> add <- with(newdata,
if(GENDER==F)sum(PROMOTION))
Warning message:
In if (GENDER == F) sum(PROMOTION) :
the condition has length > 1 and only the first element will be used
My question is why in the first code all elements have been used?
ifelse is vectorized, but if is not. For example:
> x <- rbinom(20,1,.5)
> ifelse(x,TRUE,FALSE)
[1] TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
[13] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
> if(x) {TRUE} else {FALSE}
[1] TRUE
Warning message:
In if (x) { :
the condition has length > 1 and only the first element will be used
In my R script...
I have an object myObject which is something that looks like this:
> myObject
m convInfo data call dataClasses control
FALSE FALSE FALSE FALSE FALSE FALSE
It is what is returned from an is.na(obj) where obj is an nls fit.
I'm trying to test if that first item is FALSE rather than TRUE. How can I extract that out? I tried myObject$m but that didn't work.
You have a named (logical) vector.
> v <- 1:5
> names(v) <- LETTERS[1:5]
> is.na(v)
A B C D E
FALSE FALSE FALSE FALSE FALSE
> myObj <- .Last.value
You address it like any other atomic vector:
> myObj[1]
A
FALSE
> myObj[1] == FALSE
A
TRUE
The object returned by nls() is a list. The behaviour of is.na() on a list is somewhat peculiar in the sense of what is an is not NA. From ?is.na:
Value:
The default method for ‘is.na’ applied to an atomic vector returns
a logical vector of the same length as its argument ‘x’,
containing ‘TRUE’ for those elements marked ‘NA’ or, for numeric
or complex vectors, ‘NaN’ (!) and ‘FALSE’ otherwise. ‘dim’,
‘dimnames’ and ‘names’ attributes are preserved.
The default method also works for lists and pairlists: the result
for an element is false unless that element is a length-one atomic
vector and the single element of that vector is regarded as ‘NA’
or ‘NaN’.
So t is a logical vector with the TRUE & FALSE values in your t determined as per the quoted text above. Therefore all of
t[1]
t["m"]
head(t, 1)
extract the first element of t. If you want to test for FALSE then I might try:
!isTRUE(t[1])
E.g.
> set.seed(1)
> logi <- sample(c(TRUE,FALSE), 5, replace = TRUE)
> logi
[1] TRUE TRUE FALSE FALSE TRUE
> !isTRUE(logi[1])
[1] FALSE
The reason the $ version won't work is that $ is documented to apply only to non-atomic vectors. logi (or your t) is an atomic vector, in that it contains elements of the same type.
> is.atomic(logi)
[1] TRUE
> names(logi) <- letters[1:5]
> logi$a
Error in logi$a : $ operator is invalid for atomic vectors
> logi["a"]
a
TRUE