Change specific columns as factor in R - r

I have a file name "Second"
and it has data "q1","q2",...."q40", and "q40_n1", "q40_n2","q40_n3", ..."q40_n20"
Some of them are "character" vectors and some are "integer"
My question is How can I change integer vector to "factor" at once?
q30:q35 to "factor" ------- (q(30+n))
q40_n1:q40_n4 to "factor" ---------(q40_n#)
q18:q23 to "factor"

With dplyr package:
mutate_at(Second, vars(q30:q35, q40_n1:q40_n4, q18:q23), factor)

You can control all columns on read-in using the colClasses= argument:
str(read.csv(text="a,b\na,1"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: int 1
str(read.csv(text="a,b\na,1", colClasses="factor"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "1": 1
str(read.csv(text="a,b\na,1", colClasses="character"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: chr "a"
# $ b: chr "1"
Or you can factorize it later:
dat <- read.csv(text="a,b\na,11")
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: int 11
dat$b <- factor(dat$b)
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "11": 1
### or all columns, without regard to original class
dat <- read.csv(text="a,b\na,11")
dat[] <- lapply(dat, factor)
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "11": 1

A numeric or integer vector like:
x <- c(1, 2, 3)
> str(x)
num [1:3] 1 2 3
can be converted to a factor vector:
x <- as.factor(x)
> x
[1] 1 2 3
Levels: 1 2 3
> str(x)
Factor w/ 3 levels "1","2","3": 1 2 3

Related

How to perform as.factor function?

I have multiple data frames (namely Accident, Vehicles and Casualties) which are to be merged in a single data frame as Accidents. How do I find the factors of the combined data frame that is how to find factors of Accidents?
$ accident_severity : char "Serious" "Slight" "Slight" "Slight" ...
$ number_of_vehicles : int 1 1 2 2 1 1 2 2 2 2 ...
$ number_of_casualties : int 1 1 1 1 1 1 1 1 1 1 ...
$ date : char "04/01/2005" "05/01/2005" "06/01/2005" "06/01/2005" ...
$ day_of_week : char "Tuesday" "Wednesday" "Thursday" "Thursday" ...
$ time : char "17:42" "17:36" "00:15" "00:15" ...
You can convert columns of choice from character to factor using lapply function. See the code below for columns accident_severity and day_of_week conversion:
df <- data.frame(accident_severity= c("Serious", "Slight", "Slight", "Slight"),
number_of_vehicles = c(1, 1, 2, 2),
number_of_casualties = c(1, 1, 1, 1),
date = c("04/01/2005", "05/01/2005", "06/01/2005", "06/01/2005"),
day_of_week = c("Tuesday", "Wednesday", "Thursday", "Thursday"),
time = c("17:42", "17:36", "00:15", "00:15"),
stringsAsFactors = FALSE)
str(df)
# 'data.frame': 4 obs. of 6 variables:
# $ accident_severity : Factor w/ 2 levels "Serious","Slight": 1 2 2 2
# $ number_of_vehicles : num 1 1 2 2
# $ number_of_casualties: num 1 1 1 1
# $ date : chr "04/01/2005" "05/01/2005" "06/01/2005" "06/01/2005"
# $ day_of_week : Factor w/ 3 levels "Thursday","Tuesday",..: 2 3 1 1
# $ time : chr "17:42" "17:36" "00:15" "00:15"
df[c("accident_severity", "day_of_week")] <- lapply(df[c("accident_severity", "day_of_week")], factor)
str(df)
# 'data.frame': 4 obs. of 6 variables:
# $ accident_severity : Factor w/ 2 levels "Serious","Slight": 1 2 2 2
# $ number_of_vehicles : num 1 1 2 2
# $ number_of_casualties: num 1 1 1 1
# $ date : chr "04/01/2005" "05/01/2005" "06/01/2005" "06/01/2005"
# $ day_of_week : Factor w/ 3 levels "Thursday","Tuesday",..: 2 3 1 1
# $ time : chr "17:42" "17:36" "00:15" "00:15"
To find if a column names which are factors you can use is.factor function:
names(df)[unlist(lapply(df, is.factor))]
# [1] "accident_severity" "day_of_week"

How to convert list given in a data frame to factor/numbers in R Data frame?

mydf is for reproducible purpose . I have mydf data frame , and I want to convert list as factors in mydf , but it throws an error
mydf<-data.frame(col1=c("a","b"),col2=c("f","j"))
mydf$col1<-as.list(mydf$col1)
mydf$col2<-as.list(mydf$col2)
str(mydf)
This is the error I get when I try to change lists to factors/numeric type
mydf$col1<-as.factor(mydf$col1)
Error in order(y) : unimplemented type 'list' in 'orderVector1'
I want my data frame (mydf) to be expected_df (no lists data frame)
expected_df<-data.frame(col1=c("a","b"),col2=c("f","j"))
str(expected_df)
If you compared str(mydf) and str(expected_df) , there is a difference as I am unable to change lists to factors in mydf data frame. Is there any workaround to solve my issue ?
str(mydf)
'data.frame': 2 obs. of 2 variables:
$ col1:List of 2
..$ : Factor w/ 2 levels "a","b": 1
..$ : Factor w/ 2 levels "a","b": 2
$ col2:List of 2
..$ : Factor w/ 2 levels "f","j": 1
..$ : Factor w/ 2 levels "f","j": 2
str(expected_df)
'data.frame': 2 obs. of 2 variables:
$ col1: Factor w/ 2 levels "a","b": 1 2
$ col2: Factor w/ 2 levels "f","j": 1 2
You can use stringsAsFactors = TRUE
> mydf <- data.frame(col1 = c("a", "b"), col2 = c("f", "j"), stringsAsFactors = TRUE)
> mydf
col1 col2
1 a f
2 b j
> mydf$col1
[1] a b
Levels: a b
> str(mydf)
'data.frame': 2 obs. of 2 variables:
$ col1: Factor w/ 2 levels "a","b": 1 2
$ col2: Factor w/ 2 levels "f","j": 1 2
Late to the party here, but I thought I would share my experience for future searches. I was also having the 'Error in order(y)' error when trying to convert a column to factors. The way I got round it was to explicitly label the factors. In your example it would be like so:
# instead of this:
# mydf$col1 <- as.factor(mydf$col1)
# using this:
mydf$col1 <- factor(mydf$col1, levels=c("a","b"))

How to put different size vectors in data.table column

I have implemented a simple group-by-operation with the ?stats::aggregate function. It collects elements per group in a vector. I would like to make it faster using the data.table package. However I'm not able to reproduce the wanted behaviour with data.table.
Sample dataset:
df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))
Output to reproduce with data.table:
by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)
What I've tried:
data_t <- data.table(df)
# working, but not what I want
by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group]
# no grouping done when using c or as.vector
by_group_datatable <- data_t[,j = c(val), by = group]
by_group_datatable <- data_t[,j = as.vector(val), by = group]
# grouping leads to error when using as.list
by_group_datatable <- data_t[,j = as.list(val), by = group]
Is it possible to have vectors of different size in a data.table column? If yes, how do I achieve it?
Here's one way:
data_t[, list(list(val)), by = group]
# group V1
#1: a A,B,C
#2: b A,B,C,D
#3: c A,B
The first list() is used because you want to aggregate the result. The second list is used because you want to aggregate the val column into separate lists per group.
To check the structure:
str(data_t[, list(list(val)), by = group])
#Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ V1 :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
# - attr(*, ".internal.selfref")=<externalptr>
Using dplyr, you could do the following:
library(dplyr)
df %>% group_by(group) %>% summarise(val = list(val))
#Source: local data frame [3 x 2]
#
# group val
# (fctr) (chr)
#1 a <S3:factor>
#2 b <S3:factor>
#3 c <S3:factor>
Check the structure:
df %>% group_by(group) %>% summarise(val = list(val)) %>% str
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ val :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
Here is another option with dplyr/tidyr
library(dplyr)
library(tidyr)
res <- df %>%
nest(-group)
str(res)
#'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ data :List of 3
# ..$ :'data.frame': 3 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ :'data.frame': 4 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ :'data.frame': 2 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2

How to drop unused levels after filtering by factor? [duplicate]

This question already has answers here:
Drop unused factor levels in a subsetted data frame
(16 answers)
Closed 8 years ago.
Here is an example that was taken from a fellow SO member.
# define a %not% to be the opposite of %in%
library(dplyr)
# data
f <- c("a","a","a","b","b","c")
s <- c("fall","spring","other", "fall", "other", "other")
v <- c(3,5,1,4,5,2)
(dat0 <- data.frame(f, s, v))
# f s v
#1 a fall 3
#2 a spring 5
#3 a other 1
#4 b fall 4
#5 b other 5
#6 c other 2
(sp.tmp <- filter(dat0, s == "spring"))
# f s v
#1 a spring 5
(str(sp.tmp))
#'data.frame': 1 obs. of 3 variables:
# $ f: Factor w/ 3 levels "a","b","c": 1
# $ s: Factor w/ 3 levels "fall","other",..: 3
# $ v: num 5
The df resulting from filter() has retained all the levels from the original df.
What would be the recommended way to drop the unused level(s), i.e. "fall" and "others", within the dplyr framework?
You could do something like:
dat1 <- dat0 %>%
filter(s == "spring") %>%
droplevels()
Then
str(df)
#'data.frame': 1 obs. of 3 variables:
# $ f: Factor w/ 1 level "a": 1
# $ s: Factor w/ 1 level "spring": 1
# $ v: num 5
You could use droplevels
sp.tmp <- droplevels(sp.tmp)
str(sp.tmp)
#'data.frame': 1 obs. of 3 variables:
#$ f: Factor w/ 1 level "a": 1
#$ s: Factor w/ 1 level "spring": 1
# $ v: num 5

Data frame numeric column coerced into character vector when extracted with apply [duplicate]

I want to convert variables into factors using apply():
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
results in:
x1 x2 x3
"character" "character" "character"
I don't understand why this results in character vectors instead of factor vectors.
apply converts your data.frame to a character matrix. Use lapply:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
In second command apply converts result to character matrix, using lapply:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
But for simple lookout you could use str:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Additional explanation according to comments:
Why does the lapply work while apply doesn't?
The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).
You can see in help to apply why apply and as.factor doesn't work :
In all cases the result is coerced by
as.vector to one of the basic vector
types before the dimensions are set,
so that (for example) factor results
will be coerced to a character array.
Why sapply and as.factor doesn't work you can see in help to sapply:
Value (...) An atomic vector or matrix
or list of the same length as X (...)
If simplification occurs, the output
type is determined from the highest
type of the return values in the
hierarchy NULL < raw < logical <
integer < real < complex < character <
list < expression, after coercion of
pairlists to lists.
You never get matrix of factors or data.frame.
How to convert output to data.frame?
Simple, use as.data.frame as you wrote in comment:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
But if you want to replace selected character columns with factor there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

Resources