dput a long list - shorten list but preserve structure - r

If we want to make a reproducible question on a complex/large dataset for SO, we can use dput(head(df)) to reduce the size.
Is there a similar approach to reduce the size of complex nested lists with varying list lengths? I'm thinking an approach could be to take the first few elements from each list (say first 3) irrespective of individual list type (numeric, character etc.) and nested structure but I'm not sure how to do this.
#sample nested list
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Running dput(L) will naturally produce the structure for the whole list. Is there a simple way to reduce the overall length of this list (something like dput(head(L))?
I don't want to edit the structure of the list, e.g. I don't want to flatten first or anything - I just want to reduce the size of it and keep all attributes etc.
Thanks
Edit
#thelatemail solution works well:
rapply(L, f = head, n = 3, how = "list")
What if we had a data.frame in the list though, this approach splits the df into separate lists (which I assume is to be expected as list is specified in the rapply call)?. Is there a way to modify this so that it returns head(df) as a data.frame. df included:
L_with_df <- list(
list(1:10),
list( list(1:10), list(1:10,1:10), df = data.frame(a = 1:20, b = 21:40) ),
list(list(list(list(1:10))))
)
rapply(L_with_df, f = head, n = 3, how = "list")
Edit 2
It seems rapply wont work on data.frames, see here.
However, rrapply here, which is an extension of rapply seems to do what I want:
library(rrapply)
rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] 1 2 3
# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] 1 2 3
# [[2]][[2]][[2]]
# [1] 1 2 3
# [[2]]$df
# a b
# 1 1 21
# 2 2 22
# 3 3 23
# [[3]]
# [[3]][[1]]
# [[3]][[1]][[1]]
# [[3]][[1]][[1]][[1]]
# [[3]][[1]][[1]][[1]][[1]]
# [1] 1 2 3
# Warning message:
# In rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE) :
# 'dfaslist' is deprecated, use classes = 'data.frame' instead
#this produces different output?:
#rrapply(L_with_df, f = head, n = 3, classes = "data.frame")

Let's create a nested list to serve as an example.
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Which has a structure of this:
str(L)
#List of 3
# $ :List of 1
# ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# ..$ :List of 2
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
I think using the recursive apply, rapply, with a f=unction of head can handle this without breaking the structure.
rapply(L, f=head, n=3, how="list")
Checks out:
str(rapply(L, f=head, n=3, how="list"))
#List of 3
# $ :List of 1
# ..$ : int [1:3] 1 2 3
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:3] 1 2 3
# ..$ :List of 2
# .. ..$ : int [1:3] 1 2 3
# .. ..$ : int [1:3] 1 2 3
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:3] 1 2 3

Related

Create formula inside loop over column names

I would like to loop through columns in a data set and use the name of the column to aggregate the data set. However, I am getting an error when I try to feed through the column name into the aggregate function:
"Error in model.frame.default(formula = cbind(SurveyID) ~ Panel + Category + :
variable lengths differ (found for 'i')"
Once I can store this is a temp file, I will add the temp file to a permanent dataset; however, I can't get past this part. Any help would be so much appreciated!
#example of my data:
df <- data.frame("SurveyID" = c('A','B','C','D'), "Panel" = c('E','E','S','S'), "Category" = c(1,1,2,3), "ENG" = c(3,3,1,2), "PAR"
= c(3,1,1,2), "REL" = c(3,1,1,2), "CLC"= c(3,1,1,2))
#for loop to get column name to include as part of the aggregate function
for (i in colnames(df[4:7])) {
print (i)
temp <- data.frame(setNames(aggregate(cbind(SurveyID) ~ Panel + Category + i, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
You are making one newbie mistake and one more sophisticated mistake:
Newb mistake: failing to index successive items upon assignment, i.e., overwriting earlier values with new values.
Not so newb mistake. Improper construction of formula objects. Need as.formula
temp=list() # need empty list with a name
for (i in colnames(df[4:7])) {
print (i); form <- as.formula( paste( "SurveyID ~ Panel + Category +", i) )
temp[[i]] <- data.frame(setNames(aggregate(form, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
#Output
[1] "ENG"
[1] "PAR"
[1] "REL"
[1] "CLC"
str(temp)
#----------------
List of 4
$ ENG:'data.frame': 3 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 2 2 1
..$ GENDER: num [1:3] 2 3 1
..$ Favlev: num [1:3] 1 2 3
..$ Cnt : int [1:3] 1 1 2
$ PAR:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ REL:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ CLC:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1

Convert Multiple Column Classes

I think this is a simple question but I haven't found a suitable solution. To begin with a set of simplified data :
df <- as.data.frame(matrix(1:20, 5, 4))
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: int 11 12 13 14 15
# $ V4: int 16 17 18 19 20
We can see that all the classes are the integer. What I wanna achieve is converting the 4 classes to integer, numeric, character, and factor respectively. Of course, I can use
df$V1 <- as.XXX(df$V1)
for each column, but I think it's inefficient.
Expected Output
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
Question 2
I reference #joran's answer in R Assign (or copy) column classes from a data frame to another and run the following code :
myclass <- c("integer", "numeric", "character", "factor")
df.2 <- df
df.2[] <- mapply(FUN = as, df.2, myclass, SIMPLIFY = F)
When I call df.2, an error appears :
Error in as.character.factor(x) : malformed factor
However, It's okay to call str(df.2), and apparently only V1 and V3 reach my request.
str(df.2)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4:Formal class 'factor' [package "methods"] with 3 slots
# .. ..# .Data : int 16 17 18 19 20
# .. ..# levels : chr
# .. ..# .S3Class: chr "factor"
Why cannot as function deal with classes numeric and factor?
We can use mapply and provide the functions as a list to convert the columns.
df <- as.data.frame(matrix(1:20, 5, 4))
df[] <- mapply(function(x, FUN) FUN(x),
df,
list(as.integer, as.numeric, as.character, as.factor),
SIMPLIFY = FALSE)
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
If you don't exclude the for loop method, try this :
df <- as.data.frame(matrix(1:20, 5, 4))
type <- c("integer", "numeric", "character", "factor")
for(i in 1:ncol(df)){
call <- paste("as", type[i], sep = ".")
df[[i]] <- do.call(call, list(df[[i]]))
}
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5

R appending multiple data tables to a list

I need to append data.tables to an empty list in a way that by calling an index I get the entire data table.
What I'm getting now is a list of the columns of the appended data tables so that at a particular index instead of getting a data table a get a column from one of those tables.
empty_list <- list()
dt<-data.table(c(1,2,3,4,5,6), c(4,5,6,7,8,9))
empty_list <- append(empty_list, dt)
empty_list[1]
You want to append to empty_list another list(...) object. So:
empty_list <- list()
dt<-data.table(c(1,2,3,4,5,6), c(4,5,6,7,8,9))
empty_list <- append(empty_list, list(dt))
empty_list[1]
#[[1]]
# V1 V2
#1: 1 4
#2: 2 5
#3: 3 6
#4: 4 7
#5: 5 8
#6: 6 9
As a simple representative example, consider that a data.table/data.frame is really just a fancy list.
is.list(data.table(3,4))
#[1] TRUE
str(append(list(1,2), list(3,4)))
#List of 4
# $ : num 1
# $ : num 2
# $ : num 3
# $ : num 4
str(append(list(1,2), list(list(3,4))))
#List of 3
# $ : num 1
# $ : num 2
# $ :List of 2
# ..$ : num 3
# ..$ : num 4

Replicate same data.frame n times and save them into a list

I just need to replicate my data.frame n times (e.g. 100) and save all the outputs into a list.
It should be quite easy and straightforward but I could not find any solution yet.
Fake data.frame:
df = read.table(text = 'a b
1 2
5 6
4 4
11 78
23 99', header = TRUE)
With lapply:
df_list <- lapply(1:100, function(x) df)
We can use replicate
n <- 100
lst <- replicate(n, df, simplify = FALSE)
You can use rep if you wrap it in list, as rep tries to return the same type of object you pass it:
df_list <- rep(list(df), 100)
str(df_list[1:2])
#> List of 2
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99

convert list of list to a coerced dataframe

Here a list of list x generated as follow:
list1 <- list(NULL, as.integer(0))
list2 <- list(NULL, as.integer(1))
list3 <- list(1:5, 0:4)
x <- list(a=list1, b=list2, c=list3)
x has the following structure:
str(x)
List of 3
$ a:List of 2
..$ : NULL
..$ : int 0
$ b:List of 2
..$ : NULL
..$ : int 1
$ c:List of 2
..$ : int [1:5] 1 2 3 4 5
..$ : int [1:5] 0 1 2 3 4
I'm trying to convert it to a coerced dataframe. I first used
xc <- data.frame(lapply(x, as.numeric)
I got the following error
Error in lapply(x, as.numeric) :
(list) object cannot be coerced to type 'double
Actually it only works with as.character as an argument.
My goal is to reach the dataframe with the following structure:
str(xc)
'data.frame': 2 obs. of 3 variables:
$ a: int NA 0 ...
$ b: int NA 1 ...
$ c: int [1:5] 1 2 3 4 5 int [1:5] 0 1 2 3 4
I think the columns of the resulting data frame must be lists (this is the type that can handle multiple vectors and NULL values).
Using dplyr or data.table package is probably the easiest way.
You can then convert it back to base data.frame with as.data.frame:
library(data.table)
xc <- as.data.table(x)
or
library(dplyr)
xc <- as_data_frame(x)
After converting to base data.frame, the result is the same:
as.data.frame(xc)
#> a b c
#> 1 NULL NULL 1, 2, 3, 4, 5
#> 2 0 1 0, 1, 2, 3, 4
The columns are lists:
str(as.data.frame(xc))
#> 'data.frame': 2 obs. of 3 variables:
#> $ a:List of 2
#> ..$ : NULL
#> ..$ : int 0
#> $ b:List of 2
#> ..$ : NULL
#> ..$ : int 1
#> $ c:List of 2
#> ..$ : int 1 2 3 4 5
#> ..$ : int 0 1 2 3 4

Resources