I have a dataframe called Percent_DF like below.
When I try to convert the Percentage column datatype into numeric datatype, the output does not display the correct values for Percentage column.
I have tried to convert the fctr to numeric by using as.numeric datatype conversion.
Percent_DF$Percentage <- as.numeric(Percent_DF$Percentage)
I am getting 123 and 113 instead of 50.37 and 39.78 respectively. However, the Percentage column's data type has been converted into dbl. I have no idea why the above code produces different values.
The proble is that you have % in your strings.
Try:
Percent_DF$Percentage <- as.character(Percent_DF$Percentage)
Percent_DF$Percentage <- gsub("%","",Percent_DF$Percentage)
Percent_DF$Percentage <- as.numeric(Percent_DF$Percentage)
We first turn factor to character, then remove the % and turn the value to numeric
Related
I passed my excel data frame and most variables are in the form of characters.
I have tried to transform them (starting with the column average) to numeric and make clear the "," meant decimals but it automatically fills all the cells with NA. When I print the data frame again or when I try to do the summary it is only NAs instead of numbers. I got a warning after both trials:
class(ArgIncome$Average) <- "numeric"
ArgIncome$Average <- as.numeric(as.character(ArgIncome$Average))
saying
"NAs introduced by coercion".
You can transform character variable into numeric like this:
ArgIncome$Average <- as.numeric(ArgIncome$Average) #if character
ArgIncome$Average <- as.numeric(as.character(ArgIncome$Average)) #if factor
Need to convert this character value : 2.41567e-2 into a numeric to be able to do mathematic manipulations with it. (There is a whole column of character values like this that need to be converted into numerics.)
I've changed it from a different class into a string so that it no longer holds additional data in it.
You can use the as.data.frame.numeric() function.
Here is a complete example :
# Build sample data frame
x = c('2.41567e-2','2.41567e-2','2.41567e-2'
df = data.frame(x)
# Convert the values to numeric
df$x = as.data.frame.numeric(df$x)
# Check the value type
typeof(df$x[[1]])
This returns:
[1] "integer"
I generally like R, but the type conversion issues are driving me crazy.
Following issue:
I read a data frame from a database connection. The result is a data frame with character columns.
I know that the first column is a date format - all the others are numeric. However, no matter how I tried to convert the character columns of the data frame into the correct types, it didn't work out.
Upon conversion of the data frame into a matrix and then back into a data frame, all columns became type factor - and casting factors into numerics created wrong results cause the indices of the factor levels were converted instead of the real values.
Moreover, if the table is big in size - I do not want to convert each column manually. Isn't there a way to get this done automatically?
We can use type.convert by looping over the columns of the dataset with lapply. Convert the columns to character and apply the type.convert. If it is is a character class, it will convert to factor which we can reconvert it to Date class (as there is only a single column with character class. It is not sure about the format of the 'Date' class, so in case it is a different format, specify the format argument in as.Date).
df1[] <- lapply(df1, function(x) {x1 <- type.convert(as.character(x))
if(is.factor(x1))
as.Date(x1) else x1})
I'm using a dataset that has periods (.) in place of NAs. Right now, the column I'm looking at is a factor with levels 1, 2, and .. I'm trying to take a mean, and obviously, na.rm isn't working. I went back and cleaned the data by changing the periods to NAs (pe94[pe94 == "."] <- NA), and that appeared to work. However, mean can't take the mean of a factor, and when I convert the factor to a numeric, the NAs become 3s. How can I get rid of this problem?
I also had similar issues (and other issues) converting factors into numbers for mathematical analysis. However, I found a fairly simple solution that seems to work. Hope this helps ...
#Script to convert factor data to numeric data without loss or alterations of values
#Samlpe data frame with factor variables represented by numbers
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)
#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])
#Numbers as factors WITH comma place holders
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])
I want to convert a column of numbers to numeric, but there are certain cells that say "New" and "Gone", which I want to retain as characters.
If I use as.numeric(df$col1), the numbers are converted to numeric, but the words are coerced into "NA" values.
Is there any way that I could convert all the numbers to numeric while preventing this coercion?
You can't do it with a vector because vectors can only contain a single type. However, you could do it with a list.
Data <- data.frame(col1=c("1","2","New","3","Gone"), stringsAsFactors=FALSE)
List <- lapply(as.list(Data$col1), type.convert, as.is=TRUE)
A column of a data.frame will always be all of the same type. So you cannot have the string "New" and the number 5 in the same column.
However, an example to get you on your way:
x <- c('New', 1, 'Gone', 2)
ifelse(is.na(as.numeric(x)), x, as.numeric(x))
Depending on what you're doing this can be extended to apply to your specific case.
Per Joshua's comment, you can use functions in the ifelse statement:
ifelse(is.na(as.numeric(x)), sprintf('its a string %s', x), sprintf('its a number %f', as.numeric(x)))
However, the usual technique for dealing with this situation is as Joshua outlined in his answer.