apply() not working when checking column class in a data.frame - r

I have a dataframe. I want to inspect the class of each column.
x1 = rep(1:4, times=5)
x2 = factor(rep(letters[1:4], times=5))
xdat = data.frame(x1, x2)
> class(xdat)
[1] "data.frame"
> class(xdat$x1)
[1] "integer"
> class(xdat$x2)
[1] "factor"
However, imagine that I have many columns and therefore need to use apply() to help me do the trick. But it's not working.
apply(xdat, 2, class)
x1 x2
"character" "character"
Why cannot I use apply() to see the data type of each column? or What I should do?
Thanks!

You could use
sapply(xdat, class)
# x1 x2
# "integer" "factor"
using apply would coerce the output to matrix and matrix can hold only a single 'class'. If there are 'character' columns, the result would be a single 'character' class. To understand this check
str(apply(xdat, 2, I))
#chr [1:20, 1:2] "1" "2" "3" "4" "1" "2" "3" "4" "1" ...
#- attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "x1" "x2"
Now, if we check
str(lapply(xdat, I))
#List of 2
#$ x1:Class 'AsIs' int [1:20] 1 2 3 4 1 2 3 4 1 2 ...
#$ x2: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 1 2 3 4 1 2 ...

Related

Change column class based on other data frame

I have a data frame and I am trying to convert class of each variable of dt based on col_type.
Find example below for more detail.
> dt
id <- c(1,2,3,4)
a <- c(1,4,5,6)
b <- as.character(c(0,1,1,4))
c <- as.character(c(0,1,1,0))
d <- c(0,1,1,0)
dt <- data.frame(id,a,b,c,d, stringsAsFactors = FALSE)
> str(dt)
'data.frame': 4 obs. of 5 variables:
$ id: num 1 2 3 4
$ a : num 1 4 5 6
$ b : chr "0" "1" "1" "4"
$ c : chr "0" "1" "1" "0"
$ d : num 0 1 1 0
Now, I am trying to convert class of each column based on below data frame.
> var
var <- c("id","a","b","c","d")
type <- c("character","numeric","numeric","integer","character")
col_type <- data.frame(var,type, stringsAsFactors = FALSE)
> col_type
var type
1 id character
2 a numeric
3 b numeric
4 c integer
5 d character
I want to convert id to class mention in col_type data frame and so on for all other columns.
My Attempts:
setDT(dt)
for(i in 1:ncol(dt)){
if(colnames(dt)[i]%in%col_type$var){
a <- col_type[col_type$var==paste0(intersect(colnames(dt)[i],col_type$var)),]
dt[,col_type$var[i]:=eval(parse(text = paste0("as.",col_type$type[i],"(",col_type$var[i],")")))]
}
}
Note- My solution works but it is really slow and I am wondering if I can do it more efficiently and cleanly.
Suggestions will be appreciated.
I would read the data in with the colClasses argument derived from the col_type table:
library(data.table)
library(magrittr)
setDT(col_type)
res = capture.output(fwrite(dt)) %>% paste(collapse="\n") %>%
fread(colClasses = col_type[, setNames(type, var)])
str(res)
Classes ‘data.table’ and 'data.frame': 4 obs. of 5 variables:
$ id: chr "1" "2" "3" "4"
$ a : num 1 4 5 6
$ b : num 0 1 1 4
$ c : int 0 1 1 0
$ d : chr "0" "1" "1" "0"
- attr(*, ".internal.selfref")=<externalptr>
If you can do this when the data is read in initially, it simplifies to...
res = fread("file.csv", colClasses = col_type[, setNames(type, var)])
It's straightforward to do all of this without data.table.
If somehow the data is never read into R (received as RDS?), there's:
setDT(dt)
res = dt[, Map(as, .SD, col_type$type), .SDcols=col_type$var]
str(res)
Classes ‘data.table’ and 'data.frame': 4 obs. of 5 variables:
$ id: chr "1" "2" "3" "4"
$ a : num 1 4 5 6
$ b : num 0 1 1 4
$ c : int 0 1 1 0
$ d : chr "0" "1" "1" "0"
- attr(*, ".internal.selfref")=<externalptr>
Consider base R's get() inside Map which can be used to retrieve a function from its string literal using as.* functions. Then bind list of vectors into a dataframe.
vec_list <- Map(function(v, t) get(paste0("as.", t))(dt[[v]]), col_type$var, col_type$type)
dt_new <- data.frame(vec_list, stringsAsFactors = FALSE)
str(dt_new)
# 'data.frame': 4 obs. of 5 variables:
# $ id: chr "1" "2" "3" "4"
# $ a : num 1 4 5 6
# $ b : num 0 1 1 4
# $ c : int 0 1 1 0
# $ d : chr "0" "1" "1" "0"
Possibly wrap get() in tryCatch if conversions can potentially fail.

Why are my data.frame columns lists?

I have a data.frame
'data.frame': 4 obs. of 2 variables:
$ name:List of 4
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
..$ : chr "d"
$ tvd :List of 4
..$ : num 0.149
..$ : num 0.188
..$ : num 0.161
..$ : num 0.187
structure(list(name = list("a", "b", "c",
"d"), tvd = list(0.148831029536996, 0.187699857380692,
0.161428147003292, 0.18652668961466)), .Names = c("name",
"tvd"), row.names = c(NA, -4L), class = "data.frame")
It appears that as.data.frame(lapply(z,unlist)) converts it to the usual
'data.frame': 4 obs. of 2 variables:
$ name: Factor w/ 4 levels "a",..: 4 1 2 3
$ tvd : num 0.149 0.188 0.161 0.187
However, I wonder if I could do better.
I create my ugly data frame like this:
as.data.frame(do.call(rbind,lapply(my.list, function (m)
list(name = ...,
tvd = ...))))
I wonder if it is possible to modify this expressing so that it would produce the normal data table.
It looks like you're just trying to tear down your original data then re-assemble it? If so, here are a few cool things to look at. Assume df is your data.
A data.frame is just a list in disguise. To see this, compare df[[1]] to df$name in your data. [[ is used for list indexing, as well as $. So we are actually viewing a list item when we use df$name on a data frame.
> is.data.frame(df) # df is a data frame
# [1] TRUE
> is.list(df) # and it's also a list
# [1] TRUE
> x <- as.list(df) # as.list() can be more useful than unlist() sometimes
# take a look at x here, it's a bit long
> (y <- do.call(cbind, x)) # reassemble to matrix form
# name tvd
# [1,] "a" 0.148831
# [2,] "b" 0.1876999
# [3,] "c" 0.1614281
# [4,] "d" 0.1865267
> as.data.frame(y) # back to df
# name tvd
# 1 a 0.148831
# 2 b 0.1876999
# 3 c 0.1614281
# 4 d 0.1865267
I recommend doing
do.call(rbind,lapply(my.list, function (m)
data.frame(name = ...,
tvd = ...)))
rather than trying to convert a list of lists into a data.frame

How to convert matrix (or list) into data.frame

I've got a list with different types in it. They are arranged in matrix form:
tmp <- list('a', 1, 'b', 2, 'c', 3)
dim(tmp) <- c(2,3)
tmp
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] 1 2 3
That's the form I get it out of another more complex function.
Now I want to transpose it and convert to a data.frame. So I do the following:
data <- as.data.frame(t(tmp))
data
V1 V2
1 a 1
2 b 2
3 c 3
This looks great. But it's got the wrong structure:
str(data)
'data.frame': 3 obs. of 2 variables:
$ V1:List of 3
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
$ V2:List of 3
..$ : num 1
..$ : num 2
..$ : num 3
So how do I get rid of the extra level of lists?
This should do the trick:
df <- data.frame(lapply(data.frame(t(tmp)), unlist), stringsAsFactors=FALSE)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ X1: chr "a" "b" "c"
# $ X2: num 1 2 3
The inner data.frame() call converts the matrix into a two column data.frame, with one "character" column and one "numeric" column.**
lapply(..., unlist) strips away extra list() layer.
The outer data.frame() call converts the resulting list into the data.frame you're after.
** (OK, that intermediate "character" column is really of class "factor", but it ends up making no difference in the final result. If you like, you could force it to be have class "character" by adding a stringsAsFactors=FALSE for the inner data.frame() call as well, but I don't think neglecting to do so would ever make a difference...)
Or this :
as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE))
You can inspect the result:
str(as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE)))
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3

How to get the container type of an object in R?

Suppose I have an object called v, how do I find out its container type (a vector, a list, a matrix, etc.), without trying each of the is.vector(v), is.list(v) ... ?
There are three functions which will be helpful for you: mode, str and class
First, let's make some data:
nlist <- list(a=c(1,2,3), b=c("a", "b", "c"), c=matrix(rnorm(10),5))
ndata.frame <- data.frame(a=c("a", "b", "c"), b=1:3)
ncharvec <- c("a", "b", "c")
nnumvec <- c(1, 2, 3)
nintvec <- 1:3
So let's use the functions I mentioned above:
mode(nlist)
[1] "list"
str(nlist)
List of 3
$ a: num [1:3] 1 2 3
$ b: chr [1:3] "a" "b" "c"
$ c: num [1:5, 1:2] -0.9469 -0.0602 -0.3601 0.9594 -0.4348 ...
class(nlist)
[1] "list"
Now for the data frame:
mode(ndata.frame)
[1] "list"
This may surprise, you but data frames are simply a list with a data.frame class attribute.
str(ndata.frame)
'data.frame': 3 obs. of 2 variables:
$ a: Factor w/ 3 levels "a","b","c": 1 2 3
$ b: int 1 2 3
class(ndata.frame)
[1] "data.frame"
Note that there are different modes of vectors:
mode(ncharlist)
[1] "character"
mode(nnumvec)
[1] "numeric"
mode(nintvec)
[1] "numeric"
Also see that although nnumvec and nintvec appear identical, they are quite different:
str(nnumvec)
num [1:3] 1 2 3
str(nintvec)
int [1:3] 1 2 3
class(nnumvec)
[1] "numeric"
class(nintvec)
[1] "integer"
Depending on which of these you want should determine what function you use. str is a generally good function to look at variables whereas the other two are more useful in functions.

R apply error: 'X' must have named dimnames

The "apply" documentation mentions that "Where 'X' has named dimnames, it can be a character vector selecting dimension names." I would like to use apply on a data.frame for only particular columns. Can I use the dimnames feature to do this?
I realize I can subset() X to only include the columns of interest, but I want to understand "named dimnames" better.
Below is some sample code:
> x <- data.frame(cbind(1,1:10))
> apply(x,2,sum)
X1 X2
10 55
> apply(x,c('X2'),sum)
Error in apply(x, c("X2"), sum) : 'X' must have named dimnames
> dimnames(x)
[[1]]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
[[2]]
[1] "X1" "X2"
> names(x)
[1] "X1" "X2"
> names(dimnames(x))
NULL
If I understand you correctly, you would like to use apply only on certain columns. This is not what named dimnames would accomplish. The apply function on a matrix or data.frame always applies to all the rows or all the columns. The named dimnames allows you to choose to use rows or columns by name instead of the "normal" 1 and 2:
m <- matrix(1:12,4, dimnames=list(foo=letters[1:4], bar=LETTERS[1:3]))
apply(m, "bar", sum) # Use "bar" instead of 2 to refer to the columns
However if you have the column names you'd like to apply to, you could do it by first selecting only those columns:
n <- c("A","C")
apply(m[,n], 2, sum)
# A C
#10 42
Named dimnames is a side-effect of that dimnames are stored as a list in the "dimnames" attribute in a matrix or array. Each component of the list corresponds to one dimension and can be named. This is probably more useful for multidimensional arrays...
For a data.frame, there is no "dimnames" attribute. A data.frame is essentially a list, so the list's "names" attributes corresponds to the column names, and an extra "row.names" attribute corresponds to the row names. Because of this, there is no place to store the names of the dimnames (they could have an extra attribute for that of course, but they didn't). When you call the dimnames function on a data.frame, it simply creates a list from the "row.names" and "names" attributes.
The issue is that you can't manipulate the dimnames of x directly for some reason, and x will be coerced to a matrix which isn't preserving named dimnames.
A solution is to coerce to a matrix first, then name the dimnames and then use apply()
> X <- as.matrix(x)
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "1" "2" "3" "4" ...
..$ : chr [1:2] "X1" "X2"
> dimnames(X) <- list(C1 = dimnames(x)[[1]], C2 = dimnames(x)[[2]])
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ C1: chr [1:10] "1" "2" "3" "4" ...
..$ C2: chr [1:2] "X1" "X2"
> apply(X, "C1", mean)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
> rowMeans(X)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

Resources