I am trying to convert a factor (tickets_other) in a data frame (p2) into an integer. Following the R help guide, as well as other advice from others, this code should work:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
The column does contain NAs, and so I get a warning:
Warning message:
NAs introduced by coercion
Which is fine, but after coercing it to numeric, it still reads as a factor:
class(p2$tickets_other)
[1] "factor"
The same result happens if I use as.numeric(as.character.()):
as.numeric(as.character(p2$tickets_other))
Warning message:
NAs introduced by coercion
class(p2$tickets_other)
[1] "factor"
You're doing:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
You are reading the levels from p2$tickets_other (a vector), then converting them to numeric (still a vector), then accessing the indices of that vector according to the values in p2$tickets_other
I can't imagine this is what you really want to do.
Maybe just
as.numeric(p2$tickets_other)
is what you want?
I fixed the problem. It was actually very simple. The command:
as.numeric(levels(p2$tickets_other))[p2$tickets_other]
is correct, but I failed to store the result:
p2$tickets_other <- as.numeric(levels(p2$tickets_other))[p2$tickets_other]
Simple mistake, it retrospect. Thanks to DMT for the suggestion.
Related
Error in hist.default(airquality$wind) : 'x' must be numeric
I put "hist(airquality$wind)" then it shows the error. It supposes to display a histogram in the plot as I just followed the tutorial.
Please help me
If wind is incorrectly set as a non-numeric (i.e. contains values like "1" instead of 1), the following should work:
hist(as.numeric(airquality$wind))
The built-in dataset airquality is not a data.frame, it is a matrix. You cannot use $ to extract columns of a matrix. Instead, you need to use [,] indexing. Try
hist(airquality[,"Wind"])
In a dataframe, I have a column that has numeric values and some mixed in character data for some rows. I want to remove all rows with the character data and keep those rows with a number value. The df I have is 6 million rows, so I simply made a small object to try to solve my issue and then implement at a larger scale.
Here is what I did:
a <- c("fruit", "love", 53)
b <- str_replace_all("^[:alpha:]", 0)
Reading answers to other UseMethod errors on here (about factors), I tried to change "a" to as.character(a) and attempt "b" again. But, I get the same error. I'm trying to simply make any alphabetic value into the number zero and I'm fairly new at all this.
There are several issues here, even in these two lines of code. First, a is a character vector, because its first element is a character. This means that your numeric 53 is coerced into a character.
> print(a)
[1] "fruit" "love" "53"
You've got the wrong syntax for str_replace_all. See the documentation for how to use it correctly. But that's not what you want here, because you want numerics.
The first thing you need to do is convert a to a numeric. A crude way of doing this is simply
>b <- as.numeric(a)
Warning message:
NAs introduced by coercion b
> b
[1] NA NA 53
And then subset to include only the numeric values in b:
> b <- b[!is.na(b)]
> b
[1] 53
But whether that's what you want to do with a 6 million row dataframe is another matter. Please think about exactly what you would like to do, supply us with better test data, and ask your question again.
There's probably a more efficient way of doing this on a large data frame (e.g. something column-wise, instead of row-wise), but to answer your specific question about each row a:
as.numeric(stringr::str_replace_all(a, "[a-z]+", "0"))
Note that the replacing value must be a character (the last argument in the function call, "0"). (You can look up the documentation from your R-console by: ?stringr::str_replace_all)
My numbers have “,” for 1,000 and above and R considers it as factor. I want to switch two such variables from factor to numeric (Actually both variables are Numbers, but R considers them as factor for some reason (data is imported from excel). To change a factor variable mydata$x1 to numeric variables I use the following code but it seems not to work properly and some values change, for example it changes 8180 to zero! and it happened many other values as well. Is there other ways to do so without such issues?
mydata$x1<- as.numeric(as.character(mydata$x1))
Since it seems as though the problem is that you have saved your numeric data as characters in Excel (instead of using format to display the commas) you may want a function like this.
#' Replace Commas Function
#'
#' This function converts a character representation of a number that contains a comma separator with a numeric value.
#' #keywords read data
#' #export
replaceCommas<-function(x){
x<-as.numeric(gsub("\\,", "", x))
}
Then
rcffull$RetBackers <- replaceCommas(rcffull$Returning.Backers)
rcffull$NewBackers <- replaceCommas(rcffull$New.Backers)
The reason that G5W is asking for dput ouput is that he (we) are unable to figure out where something that displays as 8180 when it's a factor might not properly be converted with that code. It's not because of leading or trailing spaces (which would not appear in a print-version of a factor. Witness this test:
> as.numeric(as.character(factor(" 8180")))
[1] 8180
> as.numeric(as.character(factor(" 8180 ")))
[1] 8180
And the fact that it gets converted to 0 is a real puzzle since generally items that do not get recognized as parseable R numerics will get coerced to NA (with a warning).
> as.numeric(as.character(factor(" 0 8180 ")))
[1] NA
Warning message:
NAs introduced by coercion
We really need the dput output from the item that displays as "8180" and its neighbors.
I loaded my dataset (original.csv) to R:
original <- read.csv("original.csv")
str(original) showed that my dataset has 16 variables (14 factors, 2 integers). 14 variables have missing values. It was OK, but 3 variables that are originally numbers, are known as factors.
I searched web and get a command as: as.numeric(as.character(original$Tumor_Size))
(Tumor_Size is a variable that has been known as factor).
By the way, missing values in my dataset are marked as dot (.)
After running: as.numeric(as.character(original$Tumor_Size)), the values of Tumor_Size were listed and in the end a warning massage as: “NAs introduced by coercion” was appeared.
I expected after running above command, the variable converted to numeric, but second str(original) showed that my guess was wrong and Tumor_Size and another two variables were factors. In the below is sample of my dataset:
a piece of my dataset
How can I solve my problem?
The crucial information here is how missing values are encoded in your data file. The corresponding argument in read.csv() is called na.strings. So if dots are used:
original <- read.csv("original.csv", na.strings = ".")
I'm not 100% sure what your problem is but maybe this will help....
original<-read.csv("original.csv",header = TRUE,stringsAsFactors = FALSE)
original$Tumor_Size<-as.numeric(original$Tumor_Size)
This will introduce NA's because it cannot convert your dot(.) to a numeric value. If you try to replace the NA's with a dot again it will return the field as a character, to do this you can use,
original$Tumor_Size[is.na(original$Tumor_Size)]<-"."
Hope this helps.
I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.