Why doesn't comparison between numeric and character variables give a warning? - r

I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?

In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

Related

UseMethod("type") error; no applicable method for 'type" applied to an object of class "c('double', 'numeric')"

In a dataframe, I have a column that has numeric values and some mixed in character data for some rows. I want to remove all rows with the character data and keep those rows with a number value. The df I have is 6 million rows, so I simply made a small object to try to solve my issue and then implement at a larger scale.
Here is what I did:
a <- c("fruit", "love", 53)
b <- str_replace_all("^[:alpha:]", 0)
Reading answers to other UseMethod errors on here (about factors), I tried to change "a" to as.character(a) and attempt "b" again. But, I get the same error. I'm trying to simply make any alphabetic value into the number zero and I'm fairly new at all this.
There are several issues here, even in these two lines of code. First, a is a character vector, because its first element is a character. This means that your numeric 53 is coerced into a character.
> print(a)
[1] "fruit" "love" "53"
You've got the wrong syntax for str_replace_all. See the documentation for how to use it correctly. But that's not what you want here, because you want numerics.
The first thing you need to do is convert a to a numeric. A crude way of doing this is simply
>b <- as.numeric(a)
Warning message:
NAs introduced by coercion b
> b
[1] NA NA 53
And then subset to include only the numeric values in b:
> b <- b[!is.na(b)]
> b
[1] 53
But whether that's what you want to do with a 6 million row dataframe is another matter. Please think about exactly what you would like to do, supply us with better test data, and ask your question again.
There's probably a more efficient way of doing this on a large data frame (e.g. something column-wise, instead of row-wise), but to answer your specific question about each row a:
as.numeric(stringr::str_replace_all(a, "[a-z]+", "0"))
Note that the replacing value must be a character (the last argument in the function call, "0"). (You can look up the documentation from your R-console by: ?stringr::str_replace_all)

Dot as a string comparison to a number [duplicate]

I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

Problems coercing factors into intergers

I am trying to convert a factor (tickets_other) in a data frame (p2) into an integer. Following the R help guide, as well as other advice from others, this code should work:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
The column does contain NAs, and so I get a warning:
Warning message:
NAs introduced by coercion
Which is fine, but after coercing it to numeric, it still reads as a factor:
class(p2$tickets_other)
[1] "factor"
The same result happens if I use as.numeric(as.character.()):
as.numeric(as.character(p2$tickets_other))
Warning message:
NAs introduced by coercion
class(p2$tickets_other)
[1] "factor"
You're doing:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
You are reading the levels from p2$tickets_other (a vector), then converting them to numeric (still a vector), then accessing the indices of that vector according to the values in p2$tickets_other
I can't imagine this is what you really want to do.
Maybe just
as.numeric(p2$tickets_other)
is what you want?
I fixed the problem. It was actually very simple. The command:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
is correct, but I failed to store the result:
p2$tickets_other <- as.numeric(levels(p2$tickets_other))[p2$tickets_other]
Simple mistake, it retrospect. Thanks to DMT for the suggestion.

Why R returns NA when computing mean on non null matrix

I have a matrix composed by floating point numbers. I have checked and there are no missing values.
Recently, I have changed the column header from (e.g.):
2670
to
COLUMN-HEADER-A-2670
I am running the code provided by Nearest Template Prediction algorithm, that I cannot change.
I found that the error is probably located when computing mean over the column axis, for each row of the matrix, i.e.
exp.mean <- apply(exp.dataset,1,mean,na.rm=T)
The mean for all rows is forced to NA and the R console tells me
Browse[2]> exp.mean <- apply(exp.dataset,1,mean,na.rm=T)
There were 50 or more warnings (use warnings() to see the first 50)
1: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
I think it is related with header type, but I cannot find anything that explains it. The algorithm worked with the previous column notation.
The answer is that in 3.1.0 read.table() returns a character vector instead of number vector if representing the table as a double matrix may lose accuracy.
From here:
type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.
If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

Why does 1..99,999 == "1".."99,999" in R, but 100,000 != "100,000"?

In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE

Resources