Related
Surprisingly, I haven't found any satisfying answer through my own research.
Suppose I have following list:
bc <- list(row0 = 1, row1 = c(1, 1), row2 = c(1L, 2, 1), row3 = c(1, 3, 3, 1))
And I want to convert all numbers in that list into integers. My first unsophisticated approach would be to just place an "L" after each numbers. This approach would work but it is quite unhandy.
Second, I've tried
as.integer(bc[[1]])
But this does not save the variable in the list as integers.
Third, I've tried something similar suggested in another thread:
as.integer(unlist(bc))
Which indeed "unlists the list" and converts the number into integers, however, I'm now stuck when I want to restore the exact same list as before again with list names.
There has to be a faster approach then this?
Thank you very much!
The map() function from purrr should help:
map(bc, as.integer)
# $row0
# [1] 1
#
# $row1
# [1] 1 1
#
# $row2
# [1] 1 2 1
#
# $row3
# [1] 1 3 3 1
str(purrr::map(bc, as.integer))
# List of 4
# $ row0: int 1
# $ row1: int [1:2] 1 1
# $ row2: int [1:3] 1 2 1
# $ row3: int [1:4] 1 3 3 1
The same result could be obtained with lapply(bc, as.integer) if you're in more of a base R kind of mood.
Create a data.frame where a column is a list
^ this got me most of the way there, but there are significant hurdles I can't seem to clear.
Pretend frame is a data.frame of 3 columns with the middle column intended to be a list. This works:
frame[1,]$list_column <- list(1:4)
None of this works:
frame[1,] <- c(1, list(1:4), 3)
frame[1,] <- c(1, I(list(1:4)), 3)
frame[1,]$list_column <- list(1,3,5)
frame[1,]$list_column <- I(list(1,3,5))
In all cases R thinks I'm trying to add multiple things to a bucket that holds 1 thing and I don't know how to tell it otherwise. (And, btw, that last one is the thing I'd really like to do.)
The key is in creating your list correctly:
> list(1:4)
[[1]]
[1] 1 2 3 4
# Produces a list that contains a single vector
> list(1:4, 7:9)
[[1]]
[1] 1 2 3 4
[[2]]
[1] 7 8 9
# Produces a list that contains two separate vectors
> list(c(1:4, 7:9))
[[1]]
[1] 1 2 3 4 7 8 9
# Produces a list that contains a single vector
So you could do something like this:
frame <- data.frame(a=1:3)
frame$list_column <- NA
frame[1,]$list_column <- list(c(1, 3, 5))
frame[2,]$list_column <- list(1:5)
frame[3,]$list_column <- list(c(1:3, 5:9))
print(frame)
a list_column
1 1 1, 3, 5
2 2 1, 2, 3, 4, 5
3 3 1, 2, 3, 5, 6, 7, 8, 9
str(frame)
'data.frame': 3 obs. of 2 variables:
$ a : int 1 2 3
$ list_column:List of 3
..$ : num 1 3 5
..$ : int 1 2 3 4 5
..$ : int 1 2 3 5 6 7 8 9
Is that what you're after?
Update to address your other query:
frame <- data.frame(a=rep(NA, 3), b=NA, c=NA)
frame[1,] <- list(list(1), list(c(2,5,7)), list(3))
When you're getting unexpected results, have a look at the structure of the object you're dealing with:
> str(c(1, list(c(2,5,7)), 3))
List of 3
$ : num 1
$ : num [1:3] 2 5 7
$ : num 3
This shows that the second element in the list is a vector with 3 items. If you try to put that into a data frame cell, you'll get an error:
> frame <- data.frame(a=rep(NA, 3), b=NA, c=NA)
> frame[1,] <- c(1, list(c(2,5,7)), 3)
Warning message:
In `[<-.data.frame`(`*tmp*`, 1, , value = list(1, c(2, 5, 7), 3)) :
replacement element 2 has 3 rows to replace 1 rows
This is telling you the number of elements don't match the number of slots in your data frame.
I have a text file of integers which I've been reading into R and storing as a data frame for the time being. However, coercing it to a matrix it (say y, using as.matrix()) doesn't seem to be the same as the matrix I created (x). Namely, if I look at a single entry I get different output
> y[1,1]
V1
0
as opposed to
> x[1,1]
[1] 0
Can anyone explain the difference?
I am interpreting your question as asking what is the difference between a matrix and a data frame and not just why does the output of y[1,1] look different if y is a data frame vs. matrix. If all you want to know is why they look different then the answer is that data frames and matrices are different classes and have different internal representations and although many operations have been designed and implemented to paper over the differences in the end matrix indexing and data frame indexing are separately implemented and do not necessarily have to be the same although hopefully they are implemented reasonably consistently. At this point it would likely be unwise to modify R to reduce any inconsistencies given how much code it might break.
matrix A matrix is a vector with dimensions.
m1 <- 1:12
dim(m1) <- c(4, 3)
m2 <- matrix(1:12, 4, 3)
identical(m1, m2)
## [1] TRUE
length(m1) # 12 elements in the underlying vector
## [1] 12
data frame
A data.frame is a named list (the names are the column names) of columns with row names -- the default row names of 1, 2, ... are internally represented as c(NA, -4L) for a 4 row data frame in order to avoid having to store a possibly large vector of row names.
DF1 <- as.data.frame(m1)
DF2 <- list(V1 = 1:4, V2 = 5:8, V3 = 9:12)
attr(DF2, "row.names") <- c(NA, -4L)
class(DF2) <- "data.frame"
identical(DF1, DF2)
## [1] TRUE
length(DF1) # 3 columns
## [1] 3
names
Matrices do not have to have row or column names whereas data frames always do. If a matrix has row and column names then they are represented as a list of two vectors called dimnames (as opposed to a named list with a row.names attribute which is how data frames represent their row names).
m3 <- m1
rownames(m3) <- c("a", "b", "c", "d")
colnames(m3) <- c("A", "B", "C")
str(m3)
## int [1:4, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:4] "a" "b" "c" "d"
## ..$ : chr [1:3] "A" "B" "C"
m4 <- m1
dimnames(m4) <- list(c("a", "b", "c", "d"), c("A", "B", "C"))
identical(m3, m4)
## [1] TRUE
lapply
Suppose we lapply over matrix m1. Since it is really a vector with dimensions we are lapplying over each of the 12 elements:
> str(lapply(m1, length))
List of 12
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
$ : int 1
whereas if we do this over DF1 we are lapplying over 3 elements each of which has length 4
> str(lapply(DF1, length))
List of 3
$ V1: int 4
$ V2: int 4
$ V3: int 4
double indexing
Indexing is such that DF1[1,1] and m1[1,1] give the same result if the matrix does not have names.
DF1[1,1]
## [1] 1
m1[1,1]
## [1] 1
If it does then there is the observed difference:
as.matrix(DF1)[1,1] # as.matrix(DF1) has col names V1, V2, V3 from DF1
V1
1
DF1[1,1]
[1] 1
One has to be careful when convering a matrix to a data frame because if there are character and numeric columns in the data frame then the conversion will force them all to the same type, i.e. all to character.
single indexing
however, if we index like this then since a data frame is a list of columns we get a data frame made of the first column
> DF1[1]
V1
1 1
2 2
3 3
4 4
but for a matrix since it is a vector with dimensions we get the first element of that vector
> m1[1]
[1] 1
other
In the usual case all elements of a matrix are numeric, or all are character but for a data frame each column might be different. One column might be numeric whereas another might be character or logical.
Typically operations on matrices are faster than operations on data frames.
The attributes assigned to data structures also depend on the methods used to import or read data, and whether they are explicitly defined or coerced using others functions.
Here is a data frame called integers created by importing data from a .txt file.
> integers
V1 V2 V3
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
Here is data a matrix called m.integers created by passing integers to as.matrix()
as.matrix(integers)
> m.integers
V1 V2 V3
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
Here is a matrix called m2 created as indicated above by using matrix()
> m2
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
Now selecting the first element of each structure gives the following.
Also looking at the attributes of each reveals the default values (or assigned values if you assigned any) for each attribute.
# The element is given a row name.
> integers[1,1]
[1] 1
# Notice attributes$row.names
> attributes(integers)
$names
[1] "V1" "V2" "V3"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4
#################################################
# The element is given a column name.
> m.integers[1,1]
V1
1
# Notice there is no row name attribute
> attributes(m.integers)
$dim
[1] 4 3
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "V1" "V2" "V3"
###############################################
# The element is given a row name.
> m2[1,1]
[1] 1
# Notice no row name attribute.
> attributes(m2)
$dim
[1] 4 3
According the the documentation for data.frame() the default for row.names = NULL and the row names are set to the integer sequence starting at [1]. And the row names are not preserved by as.matrix(). When passing a data frame to as.matrix() the column names are preserved. Rownames are also automatically assigned as a sequence of integers if unassigned when using matrix().
If necessary, the row names can be changed.
> attributes(integers)$row.names <- c("one", "two", "three", "four")
> integers
V1 V2 V3
one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12
> attributes(integers)$row.names <- c("one", "two", "three", "four")
> integers
V1 V2 V3
one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12
> attributes(m.integers)$dimnames[[2]] <- NULL
> m.integers
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> attributes(m.integers)$dimnames[[1]] <- c("one", "two", "three", "four")
> m.integers
[,1] [,2] [,3]
one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12
I'm trying to figure out how to add a data.frame or data.table to the first position in a list.
Ideally, I want a list structured as follows:
List of 4
$ :'data.frame': 1 obs. of 3 variables:
..$ a: num 2
..$ b: num 1
..$ c: num 3
$ d: num 4
$ e: num 5
$ f: num 6
Note the data.frame is an object within the structure of the list.
The problem is that I need to add the data frame to the list after the list has been created, and the data frame has to be the first element in the list. I'd like to do this using something simple like append, but when I try:
append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0)
I get a list structured:
str(append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0))
List of 6
$ a: num 2
$ b: num 1
$ c: num 3
$ : num 1
$ : num 2
$ : num 3
It appears that R is coercing data.frame into a list when I'm trying to append. How do I prevent it from doing so? Or what alternative method might there be for constructing this list, inserting the data.frame into the list in position 1, after the list's initial creation.
The issue you are having is that to put a data frame anywhere into a list as a single list element, it must be wrapped with list(). Let's have a look.
df <- data.frame(1, 2, 3)
x <- as.list(1:3)
If we just wrap with c(), which is what append() is doing under the hood, we get
c(df)
# $X1
# [1] 1
#
# $X2
# [1] 2
#
# $X3
# [1] 3
But if we wrap it in list() we get the desired list element containing the data frame.
list(df)
# [[1]]
# X1 X2 X3
# 1 1 2 3
Therefore, since x is already a list, we will need to use the following construct.
c(list(df), x) ## or append(x, list(df), 0)
# [[1]]
# X1 X2 X3
# 1 1 2 3
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 2
#
# [[4]]
# [1] 3
I have a data frame with two columns: one is strings, the other one is integers.
> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> df
x rnames
1 5 item.1
2 3 item.2
3 5 item.3
4 3 item.4
5 1 item.5
6 3 item.6
7 4 item.7
8 5 item.8
9 4 item.9
10 5 item.10
11 5 item.11
12 2 item.12
13 2 item.13
14 1 item.14
15 3 item.15
16 4 item.16
17 5 item.17
18 4 item.18
19 1 item.19
20 1 item.20
I'm trying to aggregate the strings into list or vectors of strings (characters) with the 'c' or the 'list' function, but getting weird results:
> aggregate(rnames ~ x, df, c)
x rnames
1 1 16, 6, 11, 13
2 2 4, 5
3 3 12, 15, 17, 7
4 4 18, 20, 8, 10
5 5 1, 14, 19, 2, 3, 9
When I use 'paste' instead of 'c', I can see that the aggregate is working correctly - but the result is not what I'm looking for.
> aggregate(rnames ~ x, df, paste)
x rnames
1 1 item.5, item.14, item.19, item.20
2 2 item.12, item.13
3 3 item.2, item.4, item.6, item.15
4 4 item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17
What I'm looking for is that every aggregated group would be presented as a vector or a lit (hence the use of c) as opposed to the single string I'm getting with 'paste'. Something along the lines of the following (which in reality doesn't work):
> aggregate(rnames ~ x, df, c)
x rnames
1 1 item.5, item.14, item.19, item.20
2 2 item.12, item.13
3 3 item.2, item.4, item.6, item.15
4 4 item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17
Any help would be appreciated.
You fell in the usual trap of data.frame: your character column is not a character column, it is a factor column! Hence the numbers instead of the characters in your result:
> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 2 5 5 5 5 4 3 3 2 4 ...
$ rnames: Factor w/ 20 levels "item.1","item.10",..: 1 12 14 15 16 17 18 19 20 2 ...
To prevent the conversion to factors, use argument stringAsFactors=FALSE in your call to data.frame:
> df <- data.frame(x, rnames,stringsAsFactors=FALSE)
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 5 5 3 5 5 3 2 5 1 5 ...
$ rnames: chr "item.1" "item.2" "item.3" "item.4" ...
> aggregate(rnames ~ x, df, c)
x rnames
1 1 item.9, item.13, item.17
2 2 item.7
3 3 item.3, item.6, item.19
4 4 item.12, item.15, item.16
5 5 item.1, item.2, item.4, item.5, item.8, item.10, item.11, item.14, item.18, item.20
Another solution to avoid the conversion to factor is function I:
> df <- data.frame(x, I(rnames))
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 3 5 4 5 4 5 3 3 1 1 ...
$ rnames:Class 'AsIs' chr [1:20] "item.1" "item.2" "item.3" "item.4" ...
Excerpt from ?I:
In function data.frame. Protecting an object by enclosing it in I() in
a call to data.frame inhibits the conversion of character vectors to
factors and the dropping of names, and ensures that matrices are
inserted as single columns. I can also be used to protect objects
which are to be added to a data frame, or converted to a data frame
via as.data.frame.
It achieves this by prepending the class "AsIs" to the object's
classes. Class "AsIs" has a few of its own methods, including for [,
as.data.frame, print and format.
'm not sure just exactly what it is that you are looking for... so perhaps some reference output would be good to give us an idea of what we are aiming at?
But, since your last bit of code seems to be close to what you are after, maybe a solution like the following would work:
> library(plyr)
> ddply(df, .(x), summarize, rnames = paste(rnames, collapse = "|"))
x rnames
1 1 item.9|item.11|item.20
2 2 item.1|item.2|item.15|item.16
3 3 item.7|item.8
4 4 item.4|item.5|item.6|item.12|item.13
5 5 item.3|item.10|item.14|item.17|item.18|item.19
You can vary how the individual elements are stuck together by changing the collapse argument to paste().
Alternatively, if you want to just have each of the groups as a vetor then you could use this:
> df$rnames = as.character(df$rnames)
> L = dlply(df, .(x), function(df) {df$rnames})
> L
$`1`
[1] "item.9" "item.11" "item.20"
$`2`
[1] "item.1" "item.2" "item.15" "item.16"
$`3`
[1] "item.7" "item.8"
$`4`
[1] "item.4" "item.5" "item.6" "item.12" "item.13"
$`5`
[1] "item.3" "item.10" "item.14" "item.17" "item.18" "item.19"
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
x
1 1
2 2
3 3
4 4
5 5
This gives you a list of vectors, which is what you were after. And each group can be indexed out of the resulting list:
> L[[1]]
[1] "item.9" "item.11" "item.20"