Append a data frame to a list - r

I'm trying to figure out how to add a data.frame or data.table to the first position in a list.
Ideally, I want a list structured as follows:
List of 4
$ :'data.frame': 1 obs. of 3 variables:
..$ a: num 2
..$ b: num 1
..$ c: num 3
$ d: num 4
$ e: num 5
$ f: num 6
Note the data.frame is an object within the structure of the list.
The problem is that I need to add the data frame to the list after the list has been created, and the data frame has to be the first element in the list. I'd like to do this using something simple like append, but when I try:
append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0)
I get a list structured:
str(append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0))
List of 6
$ a: num 2
$ b: num 1
$ c: num 3
$ : num 1
$ : num 2
$ : num 3
It appears that R is coercing data.frame into a list when I'm trying to append. How do I prevent it from doing so? Or what alternative method might there be for constructing this list, inserting the data.frame into the list in position 1, after the list's initial creation.

The issue you are having is that to put a data frame anywhere into a list as a single list element, it must be wrapped with list(). Let's have a look.
df <- data.frame(1, 2, 3)
x <- as.list(1:3)
If we just wrap with c(), which is what append() is doing under the hood, we get
c(df)
# $X1
# [1] 1
#
# $X2
# [1] 2
#
# $X3
# [1] 3
But if we wrap it in list() we get the desired list element containing the data frame.
list(df)
# [[1]]
# X1 X2 X3
# 1 1 2 3
Therefore, since x is already a list, we will need to use the following construct.
c(list(df), x) ## or append(x, list(df), 0)
# [[1]]
# X1 X2 X3
# 1 1 2 3
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 2
#
# [[4]]
# [1] 3

Related

How to operate with factors in data.frame?

i'm new in R, and i try to operate with data frame:
screen
how to get numeric array from row 10
ar <-df[10,1] did't work
You can use gsub to remove brackets. Please see the code below:
# Simulation
x <- factor(c("[1]", "[2,3]", "[4]", "[]"))
str(x)
# Factor w/ 4 levels "[]","[1]","[2,3]",..: 2 3 4 1
foobar <- lapply(x, function(x) {
# remove brackets
s <- gsub("\\[||\\]", "", as.character(x))
as.numeric(unlist(strsplit(s, split = ",")))
})
str(foobar)
Output:
List of 4
$ : num 1
$ : num [1:2] 2 3
$ : num 4
$ : num(0)

What is the best way to concatenate dataframe with 2 different pattern in R

I have several dataframes:
toto1_1 <- data.frame(x=1:3)
toto1_2 <- data.frame(x=1:3)
titi1_1 <- data.frame(x=1:3)
titi1_2 <- data.frame(x=1:3)
What is the best way to concatenate these tables using 2 different patterns?
Thank you.
The mget function will return a list of data-objects when given a character vector:
totoList <- mget( paste0( rep(c("toto","titi"),each=2), rep(c("1_1","1_2") ) )
str(totoList)
List of 4
$ toto1_1:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ toto1_2:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ titi1_1:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ titi1_2:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
If the goal were as single lust, then that could be an intermediate result on the way to:
do.call( "rbind", totoList) # rbind transforms character value to an R function
x
toto1_1.1 1
toto1_1.2 2
toto1_1.3 3
toto1_2.1 1
toto1_2.2 2
toto1_2.3 3
titi1_1.1 1
titi1_1.2 2
titi1_1.3 3
titi1_2.1 1
titi1_2.2 2
titi1_2.3 3
please see some suggestions below.
1. They're concatenated using the rbind() function
2. They're concatenated using the c() function -- retains as separate vector elements but concatenated into a single element
3. Perhaps more useful because it preserves the table names of the original tables for future analysis and sorting: to add each into a list, then to create a data.frame for each table name and it's data; and then concatenate into a single object using rbind()
toto1_1 <- data.frame(x=1:3)
toto1_2 <- data.frame(x=1:3)
titi1_1 <- data.frame(x=1:3)
titi1_2 <- data.frame(x=1:3)
#1
rbind(toto1_1,toto1_2,titi1_1,titi1_2)
#2
c(toto1_1,toto1_2,titi1_1,titi1_2)
#3
l <- list(toto1_1=toto1_1,toto1_2=toto1_2,titi1_1=titi1_1,titi1_2=titi1_2)
do.call(rbind,lapply(names(l),FUN=function(x) { data.frame(table_name=x,table_data=l[[x]]) }))

List if many without typing in R

I have the following code
n <- list(1,2,3)
str(n)
which outputs
> str(n)
List of 3
$ : num 1
$ : num 2
$ : num 3
I would like 100 of these, but when I do
n <- list(1:100)
str(n)
I get
List of 1
$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
The difference is one list vs three lists. How do I solve this with in R? Also, how do you solve it with the purrr package?
?as.list vs. list. Short explanation is that in your first example, you are storing three objects to their own vector within a holding list. For example:
if each number werre named:
> list(a = 1, b = 2, c = 3)
$a
[1] 1
$b
[1] 2
$c
[1] 3
vs:
> list(1:3)
[[1]]
[1] 1 2 3
BUT...
> as.list(1:3)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
1:3 in R is a numeric range, thus if stored in a vector it is the representation of the range itself, whereas list(1, 2, 3) is a list, where the first vector is 1, second 2 and so forth...

How do I stop merge from converting characters into factors?

E.g.
chr <- c("a", "b", "c")
intgr <- c(1, 2, 3)
str(chr)
str(base::merge(chr,intgr, stringsAsFactors = FALSE))
gives:
> str(base::merge(chr,intgr, stringsAsFactors = FALSE))
'data.frame': 9 obs. of 2 variables:
$ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3
$ y: num 1 1 1 2 2 2 3 3 3
I originally thought it has something to do with how merge coerces arguments into data frames. However, I thought that adding the argument stringsAsFactors = FALSE would override the default coercion behaviour of char -> factor, yet this is not working.
EDIT: Doing the following gives me expected behaviour:
options(stringsAsFactors = FALSE)
str(base::merge(chr,intgr))
that is:
> str(base::merge(chr,intgr))
'data.frame': 9 obs. of 2 variables:
$ x: chr "a" "b" "c" "a" ...
$ y: num 1 1 1 2 2 2 3 3 3
but this is not ideal as it changes the global stringsAsFactors setting.
You can accomplish this particular "merge" using expand.grid(), since you're really just taking the cartesian product. This allows you to pass the stringsAsFactors argument:
sapply(expand.grid(x=chr,y=intgr,stringsAsFactors=F),class);
## x y
## "character" "numeric"
Here's a way of working around this limitation of merge():
sapply(merge(data.frame(x=chr,stringsAsFactors=F),intgr),class);
## x y
## "character" "numeric"
I would argue that it never makes sense to pass an atomic vector to merge(), since it is only really designed for merging data.frames.
We can use CJ from data.table as welll
library(data.table)
str(CJ(chr, intgr))
Classes ‘data.table’ and 'data.frame': 9 obs. of 2 variables:
#$ V1: chr "a" "a" "a" "b" ...
#$ V2: num 1 2 3 1 2 3 1 2 3

R- Split + list function

Could anyone explain the split and list function in R? I am quite confused how to use them together. For example
x <- rnorm(10)
a <- gl(2,5)
b <- gl(5,2)
str(split(x,list(a,b))
The result I get is
List of 10
$ 1.1: num [1:2] 0.1326 -0.0578
$ 2.1: num(0)
$ 1.2: num [1:2] 0.151 0.907
$ 2.2: num(0)
$ 1.3: num -0.393
$ 2.3: num 1.83
$ 1.4: num(0)
$ 2.4: num [1:2] 0.4266 -0.0116
$ 1.5: num(0)
$ 2.5: num [1:2] 0.62 1.64
How are values in x assigned to a level in list(a,b)? Why are there some levels without any values and some with many values? I do not see any relation between the values in x and the levels of list(a,b). Are they randomly assigned?
Really apreciate if someone could help me with this.
When you call split(x, list(a, b)), you are basically saying that two x values are in the same group if they have the same a and b value and are in different groups otherwise.
list(a, b)
# [[1]]
# [1] 1 1 1 1 1 2 2 2 2 2
# Levels: 1 2
#
# [[2]]
# [1] 1 1 2 2 3 3 4 4 5 5
# Levels: 1 2 3 4 5
We can see that the first two elements in x are going to be in group "1.1" (the group where a=1 and b=1), the next two will be in group 1.2, the next one will be in group 1.3, the next one will be in group 2.3, the next two will be in group 2.4, and the last two will be in group 2.5. This is exactly what we see when we call split(x, list(a, b)):
split(x, list(a, b))
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`2.1`
# numeric(0)
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`2.2`
# numeric(0)
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`1.4`
# numeric(0)
# $`2.4`
# [1] 0.8391658 -1.0488495
# $`1.5`
# numeric(0)
# $`2.5`
# [1] 0.3141165 -1.1813052
The reason you have extra empty groups (e.g. group 2.1) is that a and b have some pairs of values where there are no x values. From ?split, you can read that the way to not include these in the output is with the drop=TRUE option:
split(x, list(a, b), drop=TRUE)
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`2.4`
# [1] 0.8391658 -1.0488495
# $`2.5`
# [1] 0.3141165 -1.1813052

Resources