How to convert matrix (or list) into data.frame - r

I've got a list with different types in it. They are arranged in matrix form:
tmp <- list('a', 1, 'b', 2, 'c', 3)
dim(tmp) <- c(2,3)
tmp
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] 1 2 3
That's the form I get it out of another more complex function.
Now I want to transpose it and convert to a data.frame. So I do the following:
data <- as.data.frame(t(tmp))
data
V1 V2
1 a 1
2 b 2
3 c 3
This looks great. But it's got the wrong structure:
str(data)
'data.frame': 3 obs. of 2 variables:
$ V1:List of 3
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
$ V2:List of 3
..$ : num 1
..$ : num 2
..$ : num 3
So how do I get rid of the extra level of lists?

This should do the trick:
df <- data.frame(lapply(data.frame(t(tmp)), unlist), stringsAsFactors=FALSE)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ X1: chr "a" "b" "c"
# $ X2: num 1 2 3
The inner data.frame() call converts the matrix into a two column data.frame, with one "character" column and one "numeric" column.**
lapply(..., unlist) strips away extra list() layer.
The outer data.frame() call converts the resulting list into the data.frame you're after.
** (OK, that intermediate "character" column is really of class "factor", but it ends up making no difference in the final result. If you like, you could force it to be have class "character" by adding a stringsAsFactors=FALSE for the inner data.frame() call as well, but I don't think neglecting to do so would ever make a difference...)

Or this :
as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE))
You can inspect the result:
str(as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE)))
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3

Related

Change column class based on other data frame

I have a data frame and I am trying to convert class of each variable of dt based on col_type.
Find example below for more detail.
> dt
id <- c(1,2,3,4)
a <- c(1,4,5,6)
b <- as.character(c(0,1,1,4))
c <- as.character(c(0,1,1,0))
d <- c(0,1,1,0)
dt <- data.frame(id,a,b,c,d, stringsAsFactors = FALSE)
> str(dt)
'data.frame': 4 obs. of 5 variables:
$ id: num 1 2 3 4
$ a : num 1 4 5 6
$ b : chr "0" "1" "1" "4"
$ c : chr "0" "1" "1" "0"
$ d : num 0 1 1 0
Now, I am trying to convert class of each column based on below data frame.
> var
var <- c("id","a","b","c","d")
type <- c("character","numeric","numeric","integer","character")
col_type <- data.frame(var,type, stringsAsFactors = FALSE)
> col_type
var type
1 id character
2 a numeric
3 b numeric
4 c integer
5 d character
I want to convert id to class mention in col_type data frame and so on for all other columns.
My Attempts:
setDT(dt)
for(i in 1:ncol(dt)){
if(colnames(dt)[i]%in%col_type$var){
a <- col_type[col_type$var==paste0(intersect(colnames(dt)[i],col_type$var)),]
dt[,col_type$var[i]:=eval(parse(text = paste0("as.",col_type$type[i],"(",col_type$var[i],")")))]
}
}
Note- My solution works but it is really slow and I am wondering if I can do it more efficiently and cleanly.
Suggestions will be appreciated.
I would read the data in with the colClasses argument derived from the col_type table:
library(data.table)
library(magrittr)
setDT(col_type)
res = capture.output(fwrite(dt)) %>% paste(collapse="\n") %>%
fread(colClasses = col_type[, setNames(type, var)])
str(res)
Classes ‘data.table’ and 'data.frame': 4 obs. of 5 variables:
$ id: chr "1" "2" "3" "4"
$ a : num 1 4 5 6
$ b : num 0 1 1 4
$ c : int 0 1 1 0
$ d : chr "0" "1" "1" "0"
- attr(*, ".internal.selfref")=<externalptr>
If you can do this when the data is read in initially, it simplifies to...
res = fread("file.csv", colClasses = col_type[, setNames(type, var)])
It's straightforward to do all of this without data.table.
If somehow the data is never read into R (received as RDS?), there's:
setDT(dt)
res = dt[, Map(as, .SD, col_type$type), .SDcols=col_type$var]
str(res)
Classes ‘data.table’ and 'data.frame': 4 obs. of 5 variables:
$ id: chr "1" "2" "3" "4"
$ a : num 1 4 5 6
$ b : num 0 1 1 4
$ c : int 0 1 1 0
$ d : chr "0" "1" "1" "0"
- attr(*, ".internal.selfref")=<externalptr>
Consider base R's get() inside Map which can be used to retrieve a function from its string literal using as.* functions. Then bind list of vectors into a dataframe.
vec_list <- Map(function(v, t) get(paste0("as.", t))(dt[[v]]), col_type$var, col_type$type)
dt_new <- data.frame(vec_list, stringsAsFactors = FALSE)
str(dt_new)
# 'data.frame': 4 obs. of 5 variables:
# $ id: chr "1" "2" "3" "4"
# $ a : num 1 4 5 6
# $ b : num 0 1 1 4
# $ c : int 0 1 1 0
# $ d : chr "0" "1" "1" "0"
Possibly wrap get() in tryCatch if conversions can potentially fail.

Avoid (as)data.frame change data to factors when converting from zoo object

If you have a data.frame with numeric columns the conversion is without problems, as explained here.
dtf=data.frame(matrix(rep(5,10),ncol=2))
#str(dtf)
dtfz <- zoo(dtf)
class(dtfz)
#[1] "zoo"
str(as.data.frame(dtfz))
#'data.frame': 5 obs. of 2 variables:
# $ X1: num 5 5 5 5 5
# $ X2: num 5 5 5 5 5
But if you have a data.frame with text columns everything is converted to factors, even when setting stringsAsFactors = FALSE
dtf=data.frame(matrix(rep("d",10),ncol=2),stringsAsFactors = FALSE)
#str(dtf)
dtfz <- zoo(dtf)
#class(dtfz)
#dtfz
All the following convert the strings to factors:
str(as.data.frame(dtfz))
str(as.data.frame(dtfz,stringsAsFactors = FALSE))
str(data.frame(dtfz))
str(data.frame(dtfz,stringsAsFactors = FALSE))
str(as.data.frame(dtfz, check.names=FALSE, row.names=NULL,stringsAsFactors = FALSE))
#'data.frame': 5 obs. of 2 variables:
# $ X1: Factor w/ 1 level "d": 1 1 1 1 1
# $ X2: Factor w/ 1 level "d": 1 1 1 1 1
How to avoid this behaviour when the data.frame has many text columns?
I found the solution based on a comment by #thelatemail. It works for the actual version of zoo (Sept/2017). As #G. Grothendieck commented, the future versions of zoo will consider the stringsAsFactors = FALSE argument.
str(base:::as.data.frame(coredata(dtfz),stringsAsFactors = FALSE))
#'data.frame': 5 obs. of 2 variables:
# $ X1: chr "d" "d" "d" "d" ...
# $ X2: chr "d" "d" "d" "d" ...

Convert list of chars into a data frame in r

suppose I have the following list:
a=list()
a[1]<-c("1")
a[2]<-c("3")
a[[1]][2]<-c("a")
a[[2]][2]<-c("b")
List of 2
$ : chr [1:2] "1" "a"
$ : chr [1:2] "3" "b"
[[1]]
[1] "1" "a"
[[2]]
[1] "3" "b"
How can I convert that list into a data frame like this?
This is how the info would looks like:
table<-data.frame(col1=c("1a","3b"))
col1
1a
3b
'data.frame': 2 obs. of 1 variable:
$ col1: Factor w/ 2 levels "1a","3b": 1 2
You can use , do.call as well however, I do feel the answer in the comment is better than this:
df <- setNames(data.frame(do.call("paste0",data.frame(do.call("rbind",a)))),"col1")
You can always read about do.call from documentation:
do.call constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
Output:
> df
col1
1 1a
2 3b
> str(df)
'data.frame': 2 obs. of 1 variable:
$ col1: Factor w/ 2 levels "1a","3b": 1 2
Try this out and let me know in case of any queries.
a=list()
a[1]<-c("1")
a[2]<-c("3")
a[[1]][2]<-c("a")
a[[2]][2]<-c("b")
b <- t(data.frame(a))
data.frame(col1=paste0(b[,1],b[,2]))

Why are my data.frame columns lists?

I have a data.frame
'data.frame': 4 obs. of 2 variables:
$ name:List of 4
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
..$ : chr "d"
$ tvd :List of 4
..$ : num 0.149
..$ : num 0.188
..$ : num 0.161
..$ : num 0.187
structure(list(name = list("a", "b", "c",
"d"), tvd = list(0.148831029536996, 0.187699857380692,
0.161428147003292, 0.18652668961466)), .Names = c("name",
"tvd"), row.names = c(NA, -4L), class = "data.frame")
It appears that as.data.frame(lapply(z,unlist)) converts it to the usual
'data.frame': 4 obs. of 2 variables:
$ name: Factor w/ 4 levels "a",..: 4 1 2 3
$ tvd : num 0.149 0.188 0.161 0.187
However, I wonder if I could do better.
I create my ugly data frame like this:
as.data.frame(do.call(rbind,lapply(my.list, function (m)
list(name = ...,
tvd = ...))))
I wonder if it is possible to modify this expressing so that it would produce the normal data table.
It looks like you're just trying to tear down your original data then re-assemble it? If so, here are a few cool things to look at. Assume df is your data.
A data.frame is just a list in disguise. To see this, compare df[[1]] to df$name in your data. [[ is used for list indexing, as well as $. So we are actually viewing a list item when we use df$name on a data frame.
> is.data.frame(df) # df is a data frame
# [1] TRUE
> is.list(df) # and it's also a list
# [1] TRUE
> x <- as.list(df) # as.list() can be more useful than unlist() sometimes
# take a look at x here, it's a bit long
> (y <- do.call(cbind, x)) # reassemble to matrix form
# name tvd
# [1,] "a" 0.148831
# [2,] "b" 0.1876999
# [3,] "c" 0.1614281
# [4,] "d" 0.1865267
> as.data.frame(y) # back to df
# name tvd
# 1 a 0.148831
# 2 b 0.1876999
# 3 c 0.1614281
# 4 d 0.1865267
I recommend doing
do.call(rbind,lapply(my.list, function (m)
data.frame(name = ...,
tvd = ...)))
rather than trying to convert a list of lists into a data.frame

creating a data.frame whose column will hold a list in each row [duplicate]

This question already has answers here:
Create a data.frame where a column is a list
(4 answers)
Closed 9 years ago.
I can't create a data frame with a column made of a collection of characters.
Is it not possible / should I stick with lists ?
>subsets <- c(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.frame(customerid = customerids,subset =subsets)
> str(transactions)
'data.frame': 2 obs. of 8 variables:
$ customerid : num 1 1
$ subset..a. : Factor w/ 1 level "a": 1 1
$ subset..d. : Factor w/ 1 level "d": 1 1
$ subset..e. : Factor w/ 1 level "e": 1 1
$ subset..a..1: Factor w/ 1 level "a": 1 1
$ subset..b. : Factor w/ 1 level "b": 1 1
$ subset..c. : Factor w/ 1 level "c": 1 1
$ subset..e..1: Factor w/ 1 level "e": 1 1
I think you've written subsets wrongly. If it is in fact this:
subsets <- list(c("a", "d", "e"), c("a", "b", "c", "e"))
# [[1]]
# [1] "a" "d" "e"
# [[2]]
# [1] "a" "b" "c" "e"
And customerids is c(1,1), then you can have subsets as a list in a column of data.frame as the total number of rows will still be the same. You can do it as follows:
DF <- data.frame(id = customerids, value = I(subsets))
# id value
# 1 1 a, d, e
# 2 1 a, b, c, e
sapply(DF, class)
# id value
# "numeric" "AsIs"
Now you can access DF$value and perform operations as you would on a list.
Use data.table instead:
library(data.table)
# note the extra list here
subsets <- list(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.table(customerid = customerids, subset = subsets)
str(transactions)
#Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
# $ customerid: num 1 1
# $ subset :List of 2
# ..$ :List of 3
# .. ..$ : chr "a"
# .. ..$ : chr "d"
# .. ..$ : chr "e"
# ..$ :List of 4
# .. ..$ : chr "a"
# .. ..$ : chr "b"
# .. ..$ : chr "c"
# .. ..$ : chr "e"
# - attr(*, ".internal.selfref")=<externalptr>
transactions
# customerid subset
#1: 1 <list>
#2: 1 <list>

Resources