Convert data.frame column format from character to factor - r

I would like to change the format (class) of some columns of my data.frame object (mydf) from charactor to factor.
I don't want to do this when I'm reading the text file by read.table() function.
Any help would be appreciated.

Hi welcome to the world of R.
mtcars #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)
#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars) # now look at the classes
This also works for character, dates, integers and other classes
Since you're new to R I'd suggest you have a look at these two websites:
R reference manuals:
http://cran.r-project.org/manuals.html
R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf

# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
# to do it for some names in a vector named 'col_names'
col_names <- names(df)
df[col_names] <- lapply(df[col_names] , factor)
Explanation. All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply. The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df
Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:
cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)

You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:
library(dplyr)
# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)
# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)

If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat:
character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)
This creates a vector identifying which columns are of class character, then applies as.factor to those columns.
Sample data:
dat <- data.frame(var1 = c("a", "b"),
var2 = c("hi", "low"),
var3 = c(0, 0.1),
stringsAsFactors = FALSE
)

Another short way you could use is a pipe (%<>%) from the magrittr package. It converts the character column mycolumn to a factor.
library(magrittr)
mydf$mycolumn %<>% factor

I've doing it with a function. In this case I will only transform character variables to factor:
for (i in 1:ncol(data)){
if(is.character(data[,i])){
data[,i]=factor(data[,i])
}
}

Unless you need to identify the columns automatically, I found this to be the simplest solution:
df$name <- as.factor(df$name)
This makes column name in dataframe df a factor.

You can use across with new dplyr 1.0.0
library(dplyr)
df <- mtcars
#To turn 1 column to factor
df <- df %>% mutate(cyl = factor(cyl))
#Turn columns to factor based on their type.
df <- df %>% mutate(across(where(is.character), factor))
#Based on the position
df <- df %>% mutate(across(c(2, 4), factor))
#Change specific columns by their name
df <- df %>% mutate(across(c(cyl, am), factor))

Related

Why does `factor` return `NA` for my data? [duplicate]

I would like to change the format (class) of some columns of my data.frame object (mydf) from charactor to factor.
I don't want to do this when I'm reading the text file by read.table() function.
Any help would be appreciated.
Hi welcome to the world of R.
mtcars #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)
#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars) # now look at the classes
This also works for character, dates, integers and other classes
Since you're new to R I'd suggest you have a look at these two websites:
R reference manuals:
http://cran.r-project.org/manuals.html
R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf
# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
# to do it for some names in a vector named 'col_names'
col_names <- names(df)
df[col_names] <- lapply(df[col_names] , factor)
Explanation. All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply. The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df
Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:
cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)
You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:
library(dplyr)
# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)
# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)
If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat:
character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)
This creates a vector identifying which columns are of class character, then applies as.factor to those columns.
Sample data:
dat <- data.frame(var1 = c("a", "b"),
var2 = c("hi", "low"),
var3 = c(0, 0.1),
stringsAsFactors = FALSE
)
Another short way you could use is a pipe (%<>%) from the magrittr package. It converts the character column mycolumn to a factor.
library(magrittr)
mydf$mycolumn %<>% factor
I've doing it with a function. In this case I will only transform character variables to factor:
for (i in 1:ncol(data)){
if(is.character(data[,i])){
data[,i]=factor(data[,i])
}
}
Unless you need to identify the columns automatically, I found this to be the simplest solution:
df$name <- as.factor(df$name)
This makes column name in dataframe df a factor.
You can use across with new dplyr 1.0.0
library(dplyr)
df <- mtcars
#To turn 1 column to factor
df <- df %>% mutate(cyl = factor(cyl))
#Turn columns to factor based on their type.
df <- df %>% mutate(across(where(is.character), factor))
#Based on the position
df <- df %>% mutate(across(c(2, 4), factor))
#Change specific columns by their name
df <- df %>% mutate(across(c(cyl, am), factor))

Make list of columns numeric - R

As always, apologies for the simple Q.
I've got a large dataset and want to change a specified list of columns into a numeric class. I can do it, but it's not very elegant and unless I change the memory requirements it won't run as the merge is too exhausts the vector memory!
library(tidyverse)
#Extract column names I want to turn into numeric from data
make_numeric <- data[252:321] %>% select(-c(contains("UNITS"))) %>% colnames()
Here I want to turn columns that are contained in make_numeric into as.numeric and insert straight back into data. I can't do this in one go, so instead I extract the data, convert, and then merge.
tmp <- data %>% select(record_id, make_numeric)
tmp <- lapply(tmp[2:56], as.numeric)
tmp <- as.data.frame(tmp)
tmp2 <- data %>% select(-make_numeric)
tmp3 <- merge(tmp, tmp2)
I'm certain there must be a better way...
There is a dplyr solution:
library(tidyverse)
library(dplyr)
#Extract column names I want to turn into numeric from data
make_numeric <- data[252:321] %>% select(-c(contains("UNITS"))) %>% colnames()
#Mutate desired columns to numeric
data <- data %>% mutate_at(vars(make_numeric), as.numeric)
Does this work?
library(data.table)
#convert to data.table
dt<- as.data.table(data)
#change colnames to numeric
dt[, colnames(dt)[colnames(dt) %in% cols] := lapply(.SD, as.numeric), .SDcols = colnames(dt)[colnames(dt) %in% cols]]

Convert multiple columns of a data frame from string to numeric in R

I have a data frame contain 10 columns of data (temperatures, humidity values etc.). R identifies those as strings. I used the following command to convert one of the columns to numeric format:
df$temp_out = as.numeric(df$temp_out)
The problem is that i have another 7 columns which also need to be converted. I could do it for each and everyone of these, but I need to do it in approximately 50 df, so it's kind of inconvenient. Any help is welcome!
We can use lapply to loop through the columns and apply as.numeric
df[cols] <- lapply(df[cols], as.numeric)
where
cols <- names(df)[4:10] # or column index (change the index if needed)
If you like to use dplyr another option is to use mutate_if():
df %>% mutate_if(is.character,as.numeric)
use tidyverse to just convert the columns that are numeric:
df <- df %>% mutate_if(is.numeric, as.numeric)
Change character dataframe column to numeric in r with multi columns
df[, 4:17] <- lapply(df[, 4:17], as.numeric)

How to Rename Column Headers in R

I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.

Name the column of data frame and set as factor at the same time

I need your help to simplify the following code.
I need to name the columns of matrix and format each of it as factor.
How can I do that for 100 columns without doing it one by one.
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
train.data <- data.frame(x1=factor(z[,1],x2=factor(z[,2],....,x100=factor(z[,52]))
Here's one option
setNames(data.frame(lapply(split(z, col(z)), factor)), paste0("x", 1:p))
or use magrittr piping syntax
library(magrittr)
split(z, col(z)) %>%
lapply(factor) %>%
data.frame %>%
setNames(paste0("x", 1:p))

Resources