How to make a summary from data frame with characters in R - r

I passed my excel data frame and most variables are in the form of characters.
I have tried to transform them (starting with the column average) to numeric and make clear the "," meant decimals but it automatically fills all the cells with NA. When I print the data frame again or when I try to do the summary it is only NAs instead of numbers. I got a warning after both trials:
class(ArgIncome$Average) <- "numeric"
ArgIncome$Average <- as.numeric(as.character(ArgIncome$Average))
saying
"NAs introduced by coercion".

You can transform character variable into numeric like this:
ArgIncome$Average <- as.numeric(ArgIncome$Average) #if character
ArgIncome$Average <- as.numeric(as.character(ArgIncome$Average)) #if factor

Related

Converting fctr to dbl in R dataframe

I have a dataframe called Percent_DF like below.
When I try to convert the Percentage column datatype into numeric datatype, the output does not display the correct values for Percentage column.
I have tried to convert the fctr to numeric by using as.numeric datatype conversion.
Percent_DF$Percentage <- as.numeric(Percent_DF$Percentage)
I am getting 123 and 113 instead of 50.37 and 39.78 respectively. However, the Percentage column's data type has been converted into dbl. I have no idea why the above code produces different values.
The proble is that you have % in your strings.
Try:
Percent_DF$Percentage <- as.character(Percent_DF$Percentage)
Percent_DF$Percentage <- gsub("%","",Percent_DF$Percentage)
Percent_DF$Percentage <- as.numeric(Percent_DF$Percentage)
We first turn factor to character, then remove the % and turn the value to numeric

changing specific area from character to numeric in R programming

I use Rstudio and imported a csv file from online.
data <- read.csv("http://databank.worldbank.org/data/download/GDP.csv", stringsAsFactors = FALSE)
In the file, column X.3 is of type character.
I want to convert row (5 to 202) from character to numeric so that I can calculate mean of it.
So, when I use this line below. It still remains as character
data[c(5:202),"X.3"] <- as.numeric(gsub(",","",data[c(5:202),"X.3"]))
when i type class(data[10,"X.3"]) it shows the output as character
I am able to convert the whole column to numeric using
data[,"X.3"] <- as.numeric(gsub(",","",data[,"X.3"]))
but i want to convert only specific row's ie from 5 to 202 beacause the other rows of the column becomes N/A. i am not sure how to do it.
Following changes to your code can help you make it numeric:
data <- read.csv("http://databank.worldbank.org/data/download/GDP.csv", header = T, stringsAsFactors = FALSE, skip = 3)
# skipping first 3 rows which is just empty space/junk and defining the one as header
data <- data[-1,]
#removing the first line after the header
data$US.dollars. <- as.numeric(gsub(',','',data$US.dollars.))
#replacing scientific comma with blank to convert the character to numeric
hist(data$US.dollars.) #sample plot
As mentioned in the comment, you cannot keep part of your column as character and part numeric because R doesn't allow that and it forces type conversion to a higher order in this case numeric to character. You can read here more about Implicit Coercion of R

R: Replace value in data frame as.numeric

I've imported a data frame from a csv-file
dat3 <- read.csv(file.choose(),as.is = TRUE)
contains names and values. My problem is, that when I try to replace a value in the data frame, e.g.
dat3[3,6]<-12
then it just assumes, that "12" is a text string and not a value, thus preventing me from using that number to mathematical operations. I'd like to being able to replace some numbers in the data frame and using them for mathematical operations.
When I try adding 1 to dat3[3,6] I get: "Error in dat3[3, 6] + 1 : non-numeric argument to binary operator".
I've tried:
lapply(dat3[3,6], as.numeric)
dat3[3,6]<-as.numeric(12)
But it doesn't work. I have though no problems in using the already imported numbers in the data frame. This only happens for numbers which I replace.
Yes!
I've found the answer!
It is:
dat[, c(3:6)] <- sapply(dat[, c(3:6)], as.numeric)
to convert column to numbers.
Thank you all!

Change data frame with factors to a big matrix R

I have a big data frame (22k rows, 400 columns) which is generated using read.csv from a csv file. It appears that every column is a factor and all the row values are the levels of this factor.
I now want to do some analysis (like PCA) but I can't work with it unless it is a matrix, but even when I try it like matrix, all I get is
> prcomp(as.matrix(my_data))
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Is there a way of transforming this data frame with factors to a simple big matrix?
I am new in R so forgive all the (maybe terrible) mistakes.
Thanks
You can do it that way:
df<-data.frame(a=as.factor(c(1,2,3)), b=as.factor(c(2,3,4)))
m<-apply(apply(df, 1, as.character), 1, as.numeric)
apply uses a method on the given data.frame. It is important not to leave out to transform it to character first, because otherwise it will be converted to the internal numeric representation of the factor.
To add column names, do this:
m<-m[-1,] # removes the first 'empty' row
colnames(m)<-c("a", "b") # replace the right hand side with your desired column names, e.g. the first row of your data.frame
One more tip. You probably read the data.frame from a file, when you set the parameter header=TRUE, the first row will not be the header but the column names of the data.frame will be correct.

Convert factor that includes "." to numeric

I'm using a dataset that has periods (.) in place of NAs. Right now, the column I'm looking at is a factor with levels 1, 2, and .. I'm trying to take a mean, and obviously, na.rm isn't working. I went back and cleaned the data by changing the periods to NAs (pe94[pe94 == "."] <- NA), and that appeared to work. However, mean can't take the mean of a factor, and when I convert the factor to a numeric, the NAs become 3s. How can I get rid of this problem?
I also had similar issues (and other issues) converting factors into numbers for mathematical analysis. However, I found a fairly simple solution that seems to work. Hope this helps ...
#Script to convert factor data to numeric data without loss or alterations of values
#Samlpe data frame with factor variables represented by numbers
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)
#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])
#Numbers as factors WITH comma place holders
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])

Resources