Here a list of list x generated as follow:
list1 <- list(NULL, as.integer(0))
list2 <- list(NULL, as.integer(1))
list3 <- list(1:5, 0:4)
x <- list(a=list1, b=list2, c=list3)
x has the following structure:
str(x)
List of 3
$ a:List of 2
..$ : NULL
..$ : int 0
$ b:List of 2
..$ : NULL
..$ : int 1
$ c:List of 2
..$ : int [1:5] 1 2 3 4 5
..$ : int [1:5] 0 1 2 3 4
I'm trying to convert it to a coerced dataframe. I first used
xc <- data.frame(lapply(x, as.numeric)
I got the following error
Error in lapply(x, as.numeric) :
(list) object cannot be coerced to type 'double
Actually it only works with as.character as an argument.
My goal is to reach the dataframe with the following structure:
str(xc)
'data.frame': 2 obs. of 3 variables:
$ a: int NA 0 ...
$ b: int NA 1 ...
$ c: int [1:5] 1 2 3 4 5 int [1:5] 0 1 2 3 4
I think the columns of the resulting data frame must be lists (this is the type that can handle multiple vectors and NULL values).
Using dplyr or data.table package is probably the easiest way.
You can then convert it back to base data.frame with as.data.frame:
library(data.table)
xc <- as.data.table(x)
or
library(dplyr)
xc <- as_data_frame(x)
After converting to base data.frame, the result is the same:
as.data.frame(xc)
#> a b c
#> 1 NULL NULL 1, 2, 3, 4, 5
#> 2 0 1 0, 1, 2, 3, 4
The columns are lists:
str(as.data.frame(xc))
#> 'data.frame': 2 obs. of 3 variables:
#> $ a:List of 2
#> ..$ : NULL
#> ..$ : int 0
#> $ b:List of 2
#> ..$ : NULL
#> ..$ : int 1
#> $ c:List of 2
#> ..$ : int 1 2 3 4 5
#> ..$ : int 0 1 2 3 4
Related
If we want to make a reproducible question on a complex/large dataset for SO, we can use dput(head(df)) to reduce the size.
Is there a similar approach to reduce the size of complex nested lists with varying list lengths? I'm thinking an approach could be to take the first few elements from each list (say first 3) irrespective of individual list type (numeric, character etc.) and nested structure but I'm not sure how to do this.
#sample nested list
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Running dput(L) will naturally produce the structure for the whole list. Is there a simple way to reduce the overall length of this list (something like dput(head(L))?
I don't want to edit the structure of the list, e.g. I don't want to flatten first or anything - I just want to reduce the size of it and keep all attributes etc.
Thanks
Edit
#thelatemail solution works well:
rapply(L, f = head, n = 3, how = "list")
What if we had a data.frame in the list though, this approach splits the df into separate lists (which I assume is to be expected as list is specified in the rapply call)?. Is there a way to modify this so that it returns head(df) as a data.frame. df included:
L_with_df <- list(
list(1:10),
list( list(1:10), list(1:10,1:10), df = data.frame(a = 1:20, b = 21:40) ),
list(list(list(list(1:10))))
)
rapply(L_with_df, f = head, n = 3, how = "list")
Edit 2
It seems rapply wont work on data.frames, see here.
However, rrapply here, which is an extension of rapply seems to do what I want:
library(rrapply)
rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] 1 2 3
# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] 1 2 3
# [[2]][[2]][[2]]
# [1] 1 2 3
# [[2]]$df
# a b
# 1 1 21
# 2 2 22
# 3 3 23
# [[3]]
# [[3]][[1]]
# [[3]][[1]][[1]]
# [[3]][[1]][[1]][[1]]
# [[3]][[1]][[1]][[1]][[1]]
# [1] 1 2 3
# Warning message:
# In rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE) :
# 'dfaslist' is deprecated, use classes = 'data.frame' instead
#this produces different output?:
#rrapply(L_with_df, f = head, n = 3, classes = "data.frame")
Let's create a nested list to serve as an example.
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Which has a structure of this:
str(L)
#List of 3
# $ :List of 1
# ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# ..$ :List of 2
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
I think using the recursive apply, rapply, with a f=unction of head can handle this without breaking the structure.
rapply(L, f=head, n=3, how="list")
Checks out:
str(rapply(L, f=head, n=3, how="list"))
#List of 3
# $ :List of 1
# ..$ : int [1:3] 1 2 3
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:3] 1 2 3
# ..$ :List of 2
# .. ..$ : int [1:3] 1 2 3
# .. ..$ : int [1:3] 1 2 3
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:3] 1 2 3
I would like to loop through columns in a data set and use the name of the column to aggregate the data set. However, I am getting an error when I try to feed through the column name into the aggregate function:
"Error in model.frame.default(formula = cbind(SurveyID) ~ Panel + Category + :
variable lengths differ (found for 'i')"
Once I can store this is a temp file, I will add the temp file to a permanent dataset; however, I can't get past this part. Any help would be so much appreciated!
#example of my data:
df <- data.frame("SurveyID" = c('A','B','C','D'), "Panel" = c('E','E','S','S'), "Category" = c(1,1,2,3), "ENG" = c(3,3,1,2), "PAR"
= c(3,1,1,2), "REL" = c(3,1,1,2), "CLC"= c(3,1,1,2))
#for loop to get column name to include as part of the aggregate function
for (i in colnames(df[4:7])) {
print (i)
temp <- data.frame(setNames(aggregate(cbind(SurveyID) ~ Panel + Category + i, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
You are making one newbie mistake and one more sophisticated mistake:
Newb mistake: failing to index successive items upon assignment, i.e., overwriting earlier values with new values.
Not so newb mistake. Improper construction of formula objects. Need as.formula
temp=list() # need empty list with a name
for (i in colnames(df[4:7])) {
print (i); form <- as.formula( paste( "SurveyID ~ Panel + Category +", i) )
temp[[i]] <- data.frame(setNames(aggregate(form, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
#Output
[1] "ENG"
[1] "PAR"
[1] "REL"
[1] "CLC"
str(temp)
#----------------
List of 4
$ ENG:'data.frame': 3 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 2 2 1
..$ GENDER: num [1:3] 2 3 1
..$ Favlev: num [1:3] 1 2 3
..$ Cnt : int [1:3] 1 1 2
$ PAR:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ REL:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ CLC:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
I think this is a simple question but I haven't found a suitable solution. To begin with a set of simplified data :
df <- as.data.frame(matrix(1:20, 5, 4))
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: int 11 12 13 14 15
# $ V4: int 16 17 18 19 20
We can see that all the classes are the integer. What I wanna achieve is converting the 4 classes to integer, numeric, character, and factor respectively. Of course, I can use
df$V1 <- as.XXX(df$V1)
for each column, but I think it's inefficient.
Expected Output
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
Question 2
I reference #joran's answer in R Assign (or copy) column classes from a data frame to another and run the following code :
myclass <- c("integer", "numeric", "character", "factor")
df.2 <- df
df.2[] <- mapply(FUN = as, df.2, myclass, SIMPLIFY = F)
When I call df.2, an error appears :
Error in as.character.factor(x) : malformed factor
However, It's okay to call str(df.2), and apparently only V1 and V3 reach my request.
str(df.2)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4:Formal class 'factor' [package "methods"] with 3 slots
# .. ..# .Data : int 16 17 18 19 20
# .. ..# levels : chr
# .. ..# .S3Class: chr "factor"
Why cannot as function deal with classes numeric and factor?
We can use mapply and provide the functions as a list to convert the columns.
df <- as.data.frame(matrix(1:20, 5, 4))
df[] <- mapply(function(x, FUN) FUN(x),
df,
list(as.integer, as.numeric, as.character, as.factor),
SIMPLIFY = FALSE)
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
If you don't exclude the for loop method, try this :
df <- as.data.frame(matrix(1:20, 5, 4))
type <- c("integer", "numeric", "character", "factor")
for(i in 1:ncol(df)){
call <- paste("as", type[i], sep = ".")
df[[i]] <- do.call(call, list(df[[i]]))
}
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
I just need to replicate my data.frame n times (e.g. 100) and save all the outputs into a list.
It should be quite easy and straightforward but I could not find any solution yet.
Fake data.frame:
df = read.table(text = 'a b
1 2
5 6
4 4
11 78
23 99', header = TRUE)
With lapply:
df_list <- lapply(1:100, function(x) df)
We can use replicate
n <- 100
lst <- replicate(n, df, simplify = FALSE)
You can use rep if you wrap it in list, as rep tries to return the same type of object you pass it:
df_list <- rep(list(df), 100)
str(df_list[1:2])
#> List of 2
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99
I want to convert variables into factors using apply():
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
results in:
x1 x2 x3
"character" "character" "character"
I don't understand why this results in character vectors instead of factor vectors.
apply converts your data.frame to a character matrix. Use lapply:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
In second command apply converts result to character matrix, using lapply:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
But for simple lookout you could use str:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Additional explanation according to comments:
Why does the lapply work while apply doesn't?
The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).
You can see in help to apply why apply and as.factor doesn't work :
In all cases the result is coerced by
as.vector to one of the basic vector
types before the dimensions are set,
so that (for example) factor results
will be coerced to a character array.
Why sapply and as.factor doesn't work you can see in help to sapply:
Value (...) An atomic vector or matrix
or list of the same length as X (...)
If simplification occurs, the output
type is determined from the highest
type of the return values in the
hierarchy NULL < raw < logical <
integer < real < complex < character <
list < expression, after coercion of
pairlists to lists.
You never get matrix of factors or data.frame.
How to convert output to data.frame?
Simple, use as.data.frame as you wrote in comment:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
But if you want to replace selected character columns with factor there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...