I just started learning R and in my first assignment, I face a problem where I need to compare a bunch of variables and while doing that I am supposed to get false when comparing two variables not only when they are not equal but also when their type is not same.
For example :
7 == "7"
gives true which should be false.
Currently, I am doing the same as follows:
var1 = 8 == "8"
var2 = typeof(8) == typeof("8")
var1 & var2
I guess there should be some much simpler approach for the same.
It seems like it implicitly converting 7 to "7" as it does when we add numeric to a character vector.
So is there a way to get the same result in 1 line?
From the ?Comparison help page:
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
On the same help page, the authors warn for using == and != for tests in if-expressions. They recommend using identical() instead:
7 == "7"
# TRUE
identical(7, "7")
# FALSE
Related
In a dataframe, I have a column that has numeric values and some mixed in character data for some rows. I want to remove all rows with the character data and keep those rows with a number value. The df I have is 6 million rows, so I simply made a small object to try to solve my issue and then implement at a larger scale.
Here is what I did:
a <- c("fruit", "love", 53)
b <- str_replace_all("^[:alpha:]", 0)
Reading answers to other UseMethod errors on here (about factors), I tried to change "a" to as.character(a) and attempt "b" again. But, I get the same error. I'm trying to simply make any alphabetic value into the number zero and I'm fairly new at all this.
There are several issues here, even in these two lines of code. First, a is a character vector, because its first element is a character. This means that your numeric 53 is coerced into a character.
> print(a)
[1] "fruit" "love" "53"
You've got the wrong syntax for str_replace_all. See the documentation for how to use it correctly. But that's not what you want here, because you want numerics.
The first thing you need to do is convert a to a numeric. A crude way of doing this is simply
>b <- as.numeric(a)
Warning message:
NAs introduced by coercion b
> b
[1] NA NA 53
And then subset to include only the numeric values in b:
> b <- b[!is.na(b)]
> b
[1] 53
But whether that's what you want to do with a 6 million row dataframe is another matter. Please think about exactly what you would like to do, supply us with better test data, and ask your question again.
There's probably a more efficient way of doing this on a large data frame (e.g. something column-wise, instead of row-wise), but to answer your specific question about each row a:
as.numeric(stringr::str_replace_all(a, "[a-z]+", "0"))
Note that the replacing value must be a character (the last argument in the function call, "0"). (You can look up the documentation from your R-console by: ?stringr::str_replace_all)
I want to convert the NA value of the data frame to "NA" (two characters). And I applied the following code:
df2[is.na(df2)]<-"NA"
However, this approach converts data types of all values to the character type.
Here is an example:
> sd<-c(1,2,3,NA)
> sd
[1] 1 2 3 NA
> sd[is.na(sd)]<-"NA"
> sd
[1] "1" "2" "3" "NA"
What I want is keeping the data type of non-NA values unchanged. Please help, thanks.
This is an answer to the problem you described in your comment rather than the original question:
There's two ways you can go about this - either you precede your if statements with things like if (is.na(df2$i)) and set a rule for what happens if there's an NA, or you can create a set of functions to replace the operators and know how to deal with NA in a way that makes sense to you. For example:
bigger_than<- function(x,y) {
if (is.na(x)) return(FALSE)
if (is.na(y)) return(TRUE)
if (x>y) return(TRUE) else return(FALSE)
}
This will return TRUE if x is bigger than y OR if y is NA, and it will return FALSE if x is NA (or if both are NA, actually) or if x is not bigger than y. Of course, how you choose to deal with NAs could vary.
Then you can replace your if statements with things like if (bigger_than(df$i, df$h)) df$h1<-"smaller".
BTW, note that once again, if you replace df$h with a character string (e.g., "smaller"), R will change that whole column's type to char - so you want to save those char strings somewhere else.
What is equivalent SQL server isnumeric function in R studio. I am trying to migrate one of SQL logic to r studio and i have column where it holds both Char values and Int values, now i want take only int values and update them as -1 in R data.table. Please help me to solve the problem.
I have attached results as image, column "A" values are current values and i am expecting have the values like column B.
There are also data type tools in R (as in SQL and other languages) such as is.numeric() and is.integer() in R. Normally these return boolean values, but you could use sub or gsub() to make it -1:
example <- list(123, 321, "not numeric", as.Date("2018/01/01"))
gsub(T, -1, sapply(example, is.numeric))
[1] "-1" "-1" "FALSE" "FALSE"
Also, note that in R numeric is different from integer.
example <- list(as.integer(123), 321, "not numeric", as.Date("2018/01/01"))
example[sapply(example, is.integer)] <- -1
example
[[1]]
[1] -1
[[2]]
[1] 321
[[3]]
[1] "not numeric"
[[4]]
[1] "2018-01-01"
You can convert them back and forth with as.numeric() and as.integer(). Also, note that in R data types in this sense are referred to as the class or classes of the data, whereas the type in R refers to the storage or R internal data type.
I think if you're specifically interested in integers, then the question above is a duplicate of the following:
Check if the number is integer
Your if condition would be something like x == round(x, 0). This will be TRUE if values are integers, but not double or other non-numeric classes.
Finally i have fix this issue by following below steps.
captured all numeric values to separate data table by using below script
CustomDerivedL2AMID <- (subset(DimCombinedEnduser$DRVDEUL2AMID, grepl('^\d+$',DimCombinedEnduser$DRVDEUL2AMID)))
library(data.table)
HandleDerivedL2AMID <-data.table(CustomDerivedL2AMID)
match the HandleDerivedL2AMID table results with original data table and replaced all values to -1.
DCE$DRVDEUL2AMID <- replace(DCE$DRVDEUL2AMID,DCE$DRVDEUL2AMID %in% HandleDerivedL2AMID$CustomDerivedL2AMID,'-1')
now i see only character values. no more numeric values with data set under DRVDEUL2AMID.
I'm using iris dataset.
Ran the following code:
functionq3 <- function(x) {
if(x[['Sepal.Length']] > 5) {
return("greater than 5")
}
else {
return("less than 5")
}
}
outputq3 <- apply(iris,1,functionq3)
print(outputq3)
It returns "greater than 5" even if the value is 5. I'm expecting "less than 5". What's going wrong?
apply coerces all the elements in the iris data frame to character. Then in your function, comparison operator > coerces the numeric 5 on the RHS of x[['Sepal.Length']] > 5 to character "5".
So the real comparison of "5.0" (in iris[['Sepal.Length']]) and 5 is "5.0" > "5". This comparison depend on how the character strings "5.0" and "5" are encoded.
See ?Comparison
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use ...
... If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE