How to convert data.frame column from Factor to numeric [duplicate] - r

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 8 years ago.
I have a data.frame whose class column is Factor. I'd like to convert it to numeric so that I can use correlation matrix.
> str(breast)
'data.frame': 699 obs. of 10 variables:
....
$ class : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...
> table(breast$class)
2 4
458 241
> cor(breast)
Error in cor(breast) : 'x' must be numeric
How can I convert a Factor column to a numeric column?

breast$class <- as.numeric(as.character(breast$class))
If you have many columns to convert to numeric
indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))
Another option is to use stringsAsFactors=FALSE while reading the file using read.table or read.csv
Just in case, other options to create/change columns
breast[,'class'] <- as.numeric(as.character(breast[,'class']))
or
breast <- transform(breast, class=as.numeric(as.character(breast)))

From ?factor:
To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

This is FAQ 7.10. Others have shown how to apply this to a single column in a data frame, or to multiple columns in a data frame. But this is really treating the symptom, not curing the cause.
A better approach is to use the colClasses argument to read.table and related functions to tell R that the column should be numeric so that it never creates a factor and creates numeric. This will put in NA for any values that do not convert to numeric.
Another better option is to figure out why R does not recognize the column as numeric (usually a non numeric character somewhere in that column) and fix the original data so that it is read in properly without needing to create NAs.
Best is a combination of the last 2, make sure the data is correct before reading it in and specify colClasses so R does not need to guess (this can speed up reading as well).

As an alternative to $dollarsign notation, use a within block:
breast <- within(breast, {
class <- as.numeric(as.character(class))
})
Note that you want to convert your vector to a character before converting it to a numeric. Simply calling as.numeric(class) will not the ids corresponding to each factor level (1, 2) rather than the levels themselves.

Related

Converting factor variables to numeric ones by replacing blanks cells to NA or changing them to "." [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 3 years ago.
I want to convert some factor variables to numeric variables by this code:
df$col <- as.numeric(df$col)
The missing values in my dataset are not represented by a dot (i.e., "."). Instead, they are blank cells.
Therefore, the above code allocates a number (i.e., "1") to each blank cells in my dataset.
My question is how I can convert factor variables to numeric ones by replacing blanks cells to NA or changing them to ".".
Thank you so much.
The right way to go about this is to use NA for missing values.
In order to convert a factor to numeric, you will have to first coerce to character. But first, replace blank cells with NAs.
x <- factor(c("1", ""))
x[x == ""] <- NA
as.numeric(as.character(x))
[1] 1 NA

To convert categorical column to numerical column ¿define function?¿load package? [duplicate]

This question already has answers here:
Add ID column by group [duplicate]
(4 answers)
Closed 6 years ago.
I'm using this procedure to convert categorical values to numeric values using levels and merge from reshape2 library. (just two columns shown for the sake of brevity)
data
printerM user
RICOH Pam
CANON Clara
TOSHIBA Joe
RICOH Fred
CANON Clark
printers.df <- data.frame(printers=unique(data$printerM))
numbers.df <- data.frame(numbers=1:length(unique(data$printerM))
printers.table <- as.data.frame(cbind(printers.df, numbers.df))
library(reshape2)
new.data<- merge(data, printers.table)
new.data$printers <- NULL
new.data
printer user numbers
RICOH Pam 1
CANON Clara 2
TOSHIBA Joe 3
RICOH Fred 1
CANON Clark 2
The issue is I got 34 columns and I'm not very happy of writing the same code 34 times, so I suppose this can be handled by:
1.- converting my code into a function
2.- using an existing R function
Not very versed on converting my R code into a function, and I don't know if this kind of transformation is available in any library.
Anyway, any hint will be much appreciated.
If you are applying this function to columns of a data frame you could make use of the fact that it is really a list underneath. For each column or list component, you want to convert to numeric if it is a factor and retain other columns as they were if I understand correctly. I will give a dummy example which does this:
df = data.frame(sample(letters[1:5],10,replace=TRUE),
runif(10),
sample(LETTERS[1:5],10,replace=TRUE),
sample(letters[11:15],10,replace=TRUE))
colnames(df) = paste0("X",1:4)
data.frame(lapply(df, function(x) if(is.factor(x)) as.numeric(x) else x))
Edit:
Note this will change all columns that are factors as it is checking each column as to whether or not it is a factor, if it is then return that factor cast to a numeric, otherwise return the original column. It is possible to also keep the original factor with the new numeric encoding too, you could have list(x,as.numeric(x)) in place of the as.numeric(x) but by default column names will become a bit funny.

Converting factor to numerical giving odd results [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 8 years ago.
I have a data frame and I need to convert 2 variables from factor to numerical variables. I have a
df$QTY.SHIPPED=as.numeric(df$QTY.SHIPPED)
df$PRE.TAX.TOTAL.=as.numeric(df$PRE.TAX.TOTAL.)
The quantity shipped converts well. Because it is already in integer format. Howerver, the PRE.TAX.TOTAL. does not convert well.
PRE.TAX.TOTAL.(Factor) PRE.TAX.TOTAL.(Numerical)
57.8 3856
210 2159
Does anybody have an idea why it is converting this way?
Thank you
convert to character first and then to numeric. Otherwise it will just be converting to the underlying integer that encodes the factor
> v<-factor(c("57.8","82.9"))
> as.numeric(v)
[1] 1 2
> as.numeric(as.character(v))
[1] 57.8 82.9
You actually could read the documentation. Typing ?factor in console produces
Warning
The interpretation of a factor depends on both the codes and the
"levels" attribute. Be careful only to compare factors with the same
set of levels (in the same order). In particular, as.numeric applied
to a factor is meaningless, and may happen by implicit coercion. To
transform a factor f to approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more efficient
than as.numeric(as.character(f)).
Thus, the more proper way would probably be as.numeric(levels(f))[f]

R package reshape function melt error: id variables not found in data when working with a lot of factors

I am working with a rarefaction output from mothur, which basically gives me a dataset containing the number of sequences sampled and the number of unique sequences in several samples. I would like to use ggplot2 to visualize this data and therefore need to use melt to go from a wide to a long format.
The problem is that I find no way to make this work due to an error of melt. Which basically states
Error: id variables not found in data: 1,3,6, (... and so on)
Because of the size of the original dataset it would be impractcal to share it here nonetheless one should be able to recreate the same problem using the following code:
a<-seq(0,300,3)
b<-runif(length(a))
c<-runif(length(a))
d<-as.data.frame(cbind(a,b,c))
d$a<-as.factor(d$a)
melt(d,d$a)
Which gives exactly the same error:
Error: id variables not found in data: 0,3,6,9, (...)
I fail to see what I am doing wrong. I am using R 2.15.1 on ubuntu server 12.04. Both the function reshape::melt and reshape2::melt result in the same error.
You should use:
melt(d, id.vars="a")
a variable value
1 0 b 0.019199459
2 3 b 0.693699677
3 6 b 0.937592641
4 9 b 0.299259963
5 12 b 0.485403439
...
From the help of ?melt.data.frame:
data
data frame to melt
id.vars
vector of id variables. Can be integer (variable position) or
string (variable name)If blank, will use all non-measured variables
Thus your id.vars argument should be a character vector of names, e.g. "a" or a numeric vector, e.g. 1. The length of this vector should equal the number of columns you want as your id.
Instead, you used a factor that contained far more elements than you have columns in your data.

Convert factor to integer [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 6 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I am manipulating a data frame using the reshape package. When using the melt function, it factorizes my value column, which is a problem because a subset of those values are integers that I want to be able to perform operations on.
Does anyone know of a way to coerce a factor into an integer? Using as.character() will convert it to the correct character, but then I cannot immediately perform an operation on it, and as.integer() or as.numeric() will convert it to the number that system is storing that factor as, which is not helpful.
Thank you!
Jeff
Quoting directly from the help page for factor:
To transform a factor f to its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
You can combine the two functions; coerce to characters thence to numerics:
> fac <- factor(c("1","2","1","2"))
> as.numeric(as.character(fac))
[1] 1 2 1 2

Resources