Converting factors to numeric but getting NAs [duplicate] - r

This question already has answers here:
R cleaning up a character and converting it into a numeric
(2 answers)
Closed 9 years ago.
I am converting factors to numbers and have tried both solutions previously posted:
as.numeric(as.character(factor))
as.numeric(levels(factor))
In both cases: I get lots of NA's and a warning message, NAs introduced by coercion. When I typed levels(factor), I do get many percentages (i.e. these are interest rates).
Is there any way I can convert these interest rates, whose class is factor, into numeric?
Thanks,
Shelley

A "number" with percentage symbol is not considered as a numeric or integer in R, so you need to remove this symbol in every number first using for example gsub before doing the coercion.
perc <- factor(c("10%", "21.6%", "15%"))
as.numeric(as.character(perc))
[1] NA NA NA
Warning message:
NAs introduced by coercion
as.numeric(gsub("\\%", "", perc))
[1] 10.0 21.6 15.0

Related

Is there a way to transform a char-type vector to a numeric-type vector? [duplicate]

This question already has answers here:
as.numeric with comma decimal separators?
(7 answers)
Closed 2 months ago.
I have a character vector that stores numbers with 1 and 2 decimals. I would like to change it to a numeric vector that keeps all the decimals to be able to make mathematical computations with it.
a <- as.character(c('1,1','1,25','1,3','1,36'))
a
"1,1" "1,25" "1,3" "1,36"
a <- as.numeric(a)
Warning message:
NAs introduced by coercion
You're going to want to replace the comma with a decimal with gsub and then convert it to numeric with as.numeric
a <- as.character(c('1,1','1,25','1,3','1,36'))
as.numeric(gsub(',','\\.',a))
[1] 1.10 1.25 1.30 1.36

convert character to numeric values (with 2 types of numeric values) using r [duplicate]

This question already has answers here:
How to read data when some numbers contain commas as thousand separator?
(11 answers)
Closed 2 years ago.
I have a variable that should be numeric but is a character, this variable has two types of numeric values, when I convert them to numeric one is not recognized as a number:
num <-c("3,98E+03", "3,98E+03","0.003382932", "5,22E+02", "0.005464587")
as.numeric(num)
NAs introduced by coercion[1] NA NA 0.003382932 NA 0.005464587
I don't want to have NA introduced.
Thank you!
You can replace the , with . using sub:
as.numeric(sub(",", ".", num, fixed = TRUE))
#[1] 3.980000e+03 3.980000e+03 3.382932e-03 5.220000e+02 5.464587e-03
The readr package has helpful functions to parse numbers from a string which may be more generalisable. string_replace() also replaces the , with a . similar to answer by #GKi
library(stringr)
library(readr)
parse_number(str_replace(num, ",", "."))

Converting factor variables to numeric ones by replacing blanks cells to NA or changing them to "." [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 3 years ago.
I want to convert some factor variables to numeric variables by this code:
df$col <- as.numeric(df$col)
The missing values in my dataset are not represented by a dot (i.e., "."). Instead, they are blank cells.
Therefore, the above code allocates a number (i.e., "1") to each blank cells in my dataset.
My question is how I can convert factor variables to numeric ones by replacing blanks cells to NA or changing them to ".".
Thank you so much.
The right way to go about this is to use NA for missing values.
In order to convert a factor to numeric, you will have to first coerce to character. But first, replace blank cells with NAs.
x <- factor(c("1", ""))
x[x == ""] <- NA
as.numeric(as.character(x))
[1] 1 NA

How to convert data.frame column from Factor to numeric [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 8 years ago.
I have a data.frame whose class column is Factor. I'd like to convert it to numeric so that I can use correlation matrix.
> str(breast)
'data.frame': 699 obs. of 10 variables:
....
$ class : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...
> table(breast$class)
2 4
458 241
> cor(breast)
Error in cor(breast) : 'x' must be numeric
How can I convert a Factor column to a numeric column?
breast$class <- as.numeric(as.character(breast$class))
If you have many columns to convert to numeric
indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))
Another option is to use stringsAsFactors=FALSE while reading the file using read.table or read.csv
Just in case, other options to create/change columns
breast[,'class'] <- as.numeric(as.character(breast[,'class']))
or
breast <- transform(breast, class=as.numeric(as.character(breast)))
From ?factor:
To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
This is FAQ 7.10. Others have shown how to apply this to a single column in a data frame, or to multiple columns in a data frame. But this is really treating the symptom, not curing the cause.
A better approach is to use the colClasses argument to read.table and related functions to tell R that the column should be numeric so that it never creates a factor and creates numeric. This will put in NA for any values that do not convert to numeric.
Another better option is to figure out why R does not recognize the column as numeric (usually a non numeric character somewhere in that column) and fix the original data so that it is read in properly without needing to create NAs.
Best is a combination of the last 2, make sure the data is correct before reading it in and specify colClasses so R does not need to guess (this can speed up reading as well).
As an alternative to $dollarsign notation, use a within block:
breast <- within(breast, {
class <- as.numeric(as.character(class))
})
Note that you want to convert your vector to a character before converting it to a numeric. Simply calling as.numeric(class) will not the ids corresponding to each factor level (1, 2) rather than the levels themselves.

Converting factor to numerical giving odd results [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 8 years ago.
I have a data frame and I need to convert 2 variables from factor to numerical variables. I have a
df$QTY.SHIPPED=as.numeric(df$QTY.SHIPPED)
df$PRE.TAX.TOTAL.=as.numeric(df$PRE.TAX.TOTAL.)
The quantity shipped converts well. Because it is already in integer format. Howerver, the PRE.TAX.TOTAL. does not convert well.
PRE.TAX.TOTAL.(Factor) PRE.TAX.TOTAL.(Numerical)
57.8 3856
210 2159
Does anybody have an idea why it is converting this way?
Thank you
convert to character first and then to numeric. Otherwise it will just be converting to the underlying integer that encodes the factor
> v<-factor(c("57.8","82.9"))
> as.numeric(v)
[1] 1 2
> as.numeric(as.character(v))
[1] 57.8 82.9
You actually could read the documentation. Typing ?factor in console produces
Warning
The interpretation of a factor depends on both the codes and the
"levels" attribute. Be careful only to compare factors with the same
set of levels (in the same order). In particular, as.numeric applied
to a factor is meaningless, and may happen by implicit coercion. To
transform a factor f to approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more efficient
than as.numeric(as.character(f)).
Thus, the more proper way would probably be as.numeric(levels(f))[f]

Resources