I have a data frame that has a variable Account_No. in which is in number format. I have account numbers that are numeric (2607242, 2607141) and alphanumeric (NWU14, NWU32). I see that all the alphanumeric data are NA. Please suggest how can I make those account number that are in alphanumeric format appear in my data set?
I tried:
as.numeric(x$Account_No."
What you described sounds like you started off with either a character or factor vector/column, then tried to coerce it to numeric, e.g.
x <- c("2607242", "2607141", "NWU14", "NWU32")
as.numeric(x)
[1] 2607242 2607141 NA NA
This also generates the warning message:
NAs introduced by coercion
If you intend to store values like NWU14, which contain characters other than numbers, then you should leave the type as character or factor.
Related
I need to convert one of the columns in my data frame from character into numeric, however, when I apply this
data_frame$column_name <-as.numeric(as.character(data_frame$column_name))
It outputs the Warning message: NAs introduced by coercion and most of my data is replaced by NA.
Could anyone help me as to how I can convert the column from characters to numeric, without most of my data being lost?
I am new to R and would appreciate the help. Thanks!
I uploaded a data frame into R and R is identifying some of my variables as "integer" (which is correct) and others as "double".
When I look at my data frame, the correct numbers are within "double" columns and when I bring my cursor over the variable names it is showing that the "double" variables are numeric and it's showing me the correct range.
However, when I type in:
range(df$double_variable)
I get:
[1] NA NA
I need to create a variable based on an equation including both the numeric value of my "double" variables and my "integer" variables... This code is running just fine, but the values for this variable are off because it's reading all of my "double" variables as NA...
I tried converting the "double" variables as integers:
df$double_variable <- as.integer(df$double_variable)
and in doing so, I received the following warning message:
NAs introduced by coercion to integer range
And when looking at the range of this now integer variable, I again got:
[1] NA NA
Any ideas on what I should try next?
When I convert my data frame columns to numeric, all the values become NA
offense[,2:13] <- apply(offense[,2:13],2,as.numeric)
The converted data frame.
Dataframe before conversion.
They are all numbers no commas, I have even tried removing white spaces if there are any by chance by using
as.data.frame(apply(offense,2,function(x)gsub('\\s+','',x)))
but still the values are converted to NA on type conversion with a warning message.
I got the data from a URL (Data Science Cookbook chapter 3)
offense <- readHTMLTable(url, encoding = "UTF-8", colClasses="character")[[7]]
The imported variables are factors, so you have to use, e.g.
as.numeric(as.character(offense$`Pts/G`))
apply(offense[, 2:13], 2, function(x) as.numeric(as.character(x)))
See ?factor:
To transform a factor f to approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more efficient
than as.numeric(as.character(f)).
(however, the first way did not work for me, maybe I made a mistake, but the second way with as.numeric(as.character()) works)
I want to convert all the NA's in one column (and only one column) of my data frame into "non-PA" instead. The class of the column is factor.
In the past I've successfully used:
df$column[is.na(df$column)] <- "non-PA"
But for some reason this time I get this error message:
In `[<-.factor`(`*tmp*`, is.na(management.points$management),
value = c(NA, : invalid factor level, NA generated
I've tried converting the column to characters and various other ways around it but I still get the same error message. What am I doing wrong?
You have to turn the column into a character vector first:
df$column <- as.character(df$column)
df$column[is.na(df$column)] <- "non-PA"
df$column <- factor(df$column)
The error happens because you cannot input a value in a factor if it is not already a level of that factor.
One potential downside (from #docendo's comment) is that this may remove unused factor levels. To keep them, you could just add "non_PA" to the levels instead of transforming to character:
levels(df$column) <- union(levels(df$column), "non_PA")
df$column[is.na(df$column)] <- "non-PA"
I'm using a dataset that has periods (.) in place of NAs. Right now, the column I'm looking at is a factor with levels 1, 2, and .. I'm trying to take a mean, and obviously, na.rm isn't working. I went back and cleaned the data by changing the periods to NAs (pe94[pe94 == "."] <- NA), and that appeared to work. However, mean can't take the mean of a factor, and when I convert the factor to a numeric, the NAs become 3s. How can I get rid of this problem?
I also had similar issues (and other issues) converting factors into numbers for mathematical analysis. However, I found a fairly simple solution that seems to work. Hope this helps ...
#Script to convert factor data to numeric data without loss or alterations of values
#Samlpe data frame with factor variables represented by numbers
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)
#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])
#Numbers as factors WITH comma place holders
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])