I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
Related
In a dataframe, I have a column that has numeric values and some mixed in character data for some rows. I want to remove all rows with the character data and keep those rows with a number value. The df I have is 6 million rows, so I simply made a small object to try to solve my issue and then implement at a larger scale.
Here is what I did:
a <- c("fruit", "love", 53)
b <- str_replace_all("^[:alpha:]", 0)
Reading answers to other UseMethod errors on here (about factors), I tried to change "a" to as.character(a) and attempt "b" again. But, I get the same error. I'm trying to simply make any alphabetic value into the number zero and I'm fairly new at all this.
There are several issues here, even in these two lines of code. First, a is a character vector, because its first element is a character. This means that your numeric 53 is coerced into a character.
> print(a)
[1] "fruit" "love" "53"
You've got the wrong syntax for str_replace_all. See the documentation for how to use it correctly. But that's not what you want here, because you want numerics.
The first thing you need to do is convert a to a numeric. A crude way of doing this is simply
>b <- as.numeric(a)
Warning message:
NAs introduced by coercion b
> b
[1] NA NA 53
And then subset to include only the numeric values in b:
> b <- b[!is.na(b)]
> b
[1] 53
But whether that's what you want to do with a 6 million row dataframe is another matter. Please think about exactly what you would like to do, supply us with better test data, and ask your question again.
There's probably a more efficient way of doing this on a large data frame (e.g. something column-wise, instead of row-wise), but to answer your specific question about each row a:
as.numeric(stringr::str_replace_all(a, "[a-z]+", "0"))
Note that the replacing value must be a character (the last argument in the function call, "0"). (You can look up the documentation from your R-console by: ?stringr::str_replace_all)
I am trying to create a logical vector in R, which will indicate for every value of a complete vector, if it is numeric or not.
I am trying to use the function is.numeric but it will only check if all the vector is numeric or not like that:
vec<-c(1,2,3,"lol")
t<-is.numeric(c[])
t
will produce FALSE
i looked here, but it will only tell how to check the entire vector and get a single value
i looked here, but the issue is not finite vs infinite
i am trying to take a data set, with some values being numbers and other being a string that implies that there is no value, and find a minimum only in the numeric values. for that i try to create a logical vector that will say for every entry of the vector if it is numeric or not. this is important for me to create that vector and i am trying to avoid a complete loop and construction of that vector if possible.
We can use numeric coercion to our advantage. R will message us to be sure that we meant to change the strings to NA. In this case, it is exactly what we are looking for:
!is.na(as.numeric(vec))
#[1] TRUE TRUE TRUE FALSE
#Warning message:
#NAs introduced by coercion
We can use grepl to get a logical vector. We match that includes only numbers from start (^) to end ($). I also included the possibility that there could be negative and floating point numbers.
grepl('^-?[0-9.]+$', vec)
#[1] TRUE TRUE TRUE FALSE
NOTE: There will be no warning messages.
I have a matrix composed by floating point numbers. I have checked and there are no missing values.
Recently, I have changed the column header from (e.g.):
2670
to
COLUMN-HEADER-A-2670
I am running the code provided by Nearest Template Prediction algorithm, that I cannot change.
I found that the error is probably located when computing mean over the column axis, for each row of the matrix, i.e.
exp.mean <- apply(exp.dataset,1,mean,na.rm=T)
The mean for all rows is forced to NA and the R console tells me
Browse[2]> exp.mean <- apply(exp.dataset,1,mean,na.rm=T)
There were 50 or more warnings (use warnings() to see the first 50)
1: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
I think it is related with header type, but I cannot find anything that explains it. The algorithm worked with the previous column notation.
The answer is that in 3.1.0 read.table() returns a character vector instead of number vector if representing the table as a double matrix may lose accuracy.
From here:
type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.
If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".
I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE