I have the following list of dataframes structure:
str(mylist)
List of 2
$ L1 :'data.frame': 12471 obs. of 3 variables:
...$ colA : Date[1:12471], format: "2006-10-10" "2010-06-21" ...
...$ colB : int [1:12471], 62 42 55 12 78 ...
...$ colC : Factor w/ 3 levels "type1","type2","type3",..: 1 2 3 2 2 ...
I would like to replace type1 or type2 with a new factor type4.
I have tried:
mylist <- lapply(mylist, transform, colC =
replace(colC, colC == 'type1','type4'))
Warning message:
1: In `[<-.factor`(`*tmp*`, list, value = "type4") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, list, value = "type4") :
invalid factor level, NA generated
I do not want to read in my initial data with stringAsFactor=F but i have tried adding type4 as a level in my initial dataset (before splitting into a list of dataframes) using:
levels(mydf$colC) <- c(levels(mydf$colC), "type4")
but I still get the same error when trying to replace.
how do I tell replace that type4 is to be treated as a factor?
You can try to use levels options to renew your factor.
Such as,
status <- factor(status, order=TRUE, levels=c("1", "3", "2",...))
c("1", "3", "2",...) is your type4 in here.
As you state, the crucial thing is to add the new factor level.
## Test data:
mydf <- data.frame(colC = factor(c("type1", "type2", "type3", "type2", "type2")))
mylist <- list(mydf, mydf)
Your data has three factor levels:
> str(mylist)
List of 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 3 levels "type1","type2",..: 1 2 3 2 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 3 levels "type1","type2",..: 1 2 3 2 2
Now add the fourth factor level, then your replace command should work:
## Change levels:
for (ii in seq(along = mylist)) levels(mylist[[ii]]$colC) <-
c(levels(mylist[[ii]]$colC), "type4")
## Replace level:
mylist <- lapply(mylist, transform, colC = replace(colC,
colC == 'type1','type4'))
The new data has four factor levels:
> str(mylist)
List of 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 4 levels "type1","type2",..: 4 2 3 2 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 4 levels "type1","type2",..: 4 2 3 2 2
Related
I have a file name "Second"
and it has data "q1","q2",...."q40", and "q40_n1", "q40_n2","q40_n3", ..."q40_n20"
Some of them are "character" vectors and some are "integer"
My question is How can I change integer vector to "factor" at once?
q30:q35 to "factor" ------- (q(30+n))
q40_n1:q40_n4 to "factor" ---------(q40_n#)
q18:q23 to "factor"
With dplyr package:
mutate_at(Second, vars(q30:q35, q40_n1:q40_n4, q18:q23), factor)
You can control all columns on read-in using the colClasses= argument:
str(read.csv(text="a,b\na,1"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: int 1
str(read.csv(text="a,b\na,1", colClasses="factor"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "1": 1
str(read.csv(text="a,b\na,1", colClasses="character"))
# 'data.frame': 1 obs. of 2 variables:
# $ a: chr "a"
# $ b: chr "1"
Or you can factorize it later:
dat <- read.csv(text="a,b\na,11")
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: int 11
dat$b <- factor(dat$b)
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "11": 1
### or all columns, without regard to original class
dat <- read.csv(text="a,b\na,11")
dat[] <- lapply(dat, factor)
str(dat)
# 'data.frame': 1 obs. of 2 variables:
# $ a: Factor w/ 1 level "a": 1
# $ b: Factor w/ 1 level "11": 1
A numeric or integer vector like:
x <- c(1, 2, 3)
> str(x)
num [1:3] 1 2 3
can be converted to a factor vector:
x <- as.factor(x)
> x
[1] 1 2 3
Levels: 1 2 3
> str(x)
Factor w/ 3 levels "1","2","3": 1 2 3
mydf is for reproducible purpose . I have mydf data frame , and I want to convert list as factors in mydf , but it throws an error
mydf<-data.frame(col1=c("a","b"),col2=c("f","j"))
mydf$col1<-as.list(mydf$col1)
mydf$col2<-as.list(mydf$col2)
str(mydf)
This is the error I get when I try to change lists to factors/numeric type
mydf$col1<-as.factor(mydf$col1)
Error in order(y) : unimplemented type 'list' in 'orderVector1'
I want my data frame (mydf) to be expected_df (no lists data frame)
expected_df<-data.frame(col1=c("a","b"),col2=c("f","j"))
str(expected_df)
If you compared str(mydf) and str(expected_df) , there is a difference as I am unable to change lists to factors in mydf data frame. Is there any workaround to solve my issue ?
str(mydf)
'data.frame': 2 obs. of 2 variables:
$ col1:List of 2
..$ : Factor w/ 2 levels "a","b": 1
..$ : Factor w/ 2 levels "a","b": 2
$ col2:List of 2
..$ : Factor w/ 2 levels "f","j": 1
..$ : Factor w/ 2 levels "f","j": 2
str(expected_df)
'data.frame': 2 obs. of 2 variables:
$ col1: Factor w/ 2 levels "a","b": 1 2
$ col2: Factor w/ 2 levels "f","j": 1 2
You can use stringsAsFactors = TRUE
> mydf <- data.frame(col1 = c("a", "b"), col2 = c("f", "j"), stringsAsFactors = TRUE)
> mydf
col1 col2
1 a f
2 b j
> mydf$col1
[1] a b
Levels: a b
> str(mydf)
'data.frame': 2 obs. of 2 variables:
$ col1: Factor w/ 2 levels "a","b": 1 2
$ col2: Factor w/ 2 levels "f","j": 1 2
Late to the party here, but I thought I would share my experience for future searches. I was also having the 'Error in order(y)' error when trying to convert a column to factors. The way I got round it was to explicitly label the factors. In your example it would be like so:
# instead of this:
# mydf$col1 <- as.factor(mydf$col1)
# using this:
mydf$col1 <- factor(mydf$col1, levels=c("a","b"))
I have implemented a simple group-by-operation with the ?stats::aggregate function. It collects elements per group in a vector. I would like to make it faster using the data.table package. However I'm not able to reproduce the wanted behaviour with data.table.
Sample dataset:
df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))
Output to reproduce with data.table:
by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)
What I've tried:
data_t <- data.table(df)
# working, but not what I want
by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group]
# no grouping done when using c or as.vector
by_group_datatable <- data_t[,j = c(val), by = group]
by_group_datatable <- data_t[,j = as.vector(val), by = group]
# grouping leads to error when using as.list
by_group_datatable <- data_t[,j = as.list(val), by = group]
Is it possible to have vectors of different size in a data.table column? If yes, how do I achieve it?
Here's one way:
data_t[, list(list(val)), by = group]
# group V1
#1: a A,B,C
#2: b A,B,C,D
#3: c A,B
The first list() is used because you want to aggregate the result. The second list is used because you want to aggregate the val column into separate lists per group.
To check the structure:
str(data_t[, list(list(val)), by = group])
#Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ V1 :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
# - attr(*, ".internal.selfref")=<externalptr>
Using dplyr, you could do the following:
library(dplyr)
df %>% group_by(group) %>% summarise(val = list(val))
#Source: local data frame [3 x 2]
#
# group val
# (fctr) (chr)
#1 a <S3:factor>
#2 b <S3:factor>
#3 c <S3:factor>
Check the structure:
df %>% group_by(group) %>% summarise(val = list(val)) %>% str
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ val :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
Here is another option with dplyr/tidyr
library(dplyr)
library(tidyr)
res <- df %>%
nest(-group)
str(res)
#'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ data :List of 3
# ..$ :'data.frame': 3 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ :'data.frame': 4 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ :'data.frame': 2 obs. of 1 variable:
# .. ..$ val: Factor w/ 4 levels "A","B","C","D": 1 2
This question already has answers here:
Drop unused factor levels in a subsetted data frame
(16 answers)
Closed 8 years ago.
Here is an example that was taken from a fellow SO member.
# define a %not% to be the opposite of %in%
library(dplyr)
# data
f <- c("a","a","a","b","b","c")
s <- c("fall","spring","other", "fall", "other", "other")
v <- c(3,5,1,4,5,2)
(dat0 <- data.frame(f, s, v))
# f s v
#1 a fall 3
#2 a spring 5
#3 a other 1
#4 b fall 4
#5 b other 5
#6 c other 2
(sp.tmp <- filter(dat0, s == "spring"))
# f s v
#1 a spring 5
(str(sp.tmp))
#'data.frame': 1 obs. of 3 variables:
# $ f: Factor w/ 3 levels "a","b","c": 1
# $ s: Factor w/ 3 levels "fall","other",..: 3
# $ v: num 5
The df resulting from filter() has retained all the levels from the original df.
What would be the recommended way to drop the unused level(s), i.e. "fall" and "others", within the dplyr framework?
You could do something like:
dat1 <- dat0 %>%
filter(s == "spring") %>%
droplevels()
Then
str(df)
#'data.frame': 1 obs. of 3 variables:
# $ f: Factor w/ 1 level "a": 1
# $ s: Factor w/ 1 level "spring": 1
# $ v: num 5
You could use droplevels
sp.tmp <- droplevels(sp.tmp)
str(sp.tmp)
#'data.frame': 1 obs. of 3 variables:
#$ f: Factor w/ 1 level "a": 1
#$ s: Factor w/ 1 level "spring": 1
# $ v: num 5
I want to convert variables into factors using apply():
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
results in:
x1 x2 x3
"character" "character" "character"
I don't understand why this results in character vectors instead of factor vectors.
apply converts your data.frame to a character matrix. Use lapply:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
In second command apply converts result to character matrix, using lapply:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
But for simple lookout you could use str:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Additional explanation according to comments:
Why does the lapply work while apply doesn't?
The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).
You can see in help to apply why apply and as.factor doesn't work :
In all cases the result is coerced by
as.vector to one of the basic vector
types before the dimensions are set,
so that (for example) factor results
will be coerced to a character array.
Why sapply and as.factor doesn't work you can see in help to sapply:
Value (...) An atomic vector or matrix
or list of the same length as X (...)
If simplification occurs, the output
type is determined from the highest
type of the return values in the
hierarchy NULL < raw < logical <
integer < real < complex < character <
list < expression, after coercion of
pairlists to lists.
You never get matrix of factors or data.frame.
How to convert output to data.frame?
Simple, use as.data.frame as you wrote in comment:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
But if you want to replace selected character columns with factor there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...