The "apply" documentation mentions that "Where 'X' has named dimnames, it can be a character vector selecting dimension names." I would like to use apply on a data.frame for only particular columns. Can I use the dimnames feature to do this?
I realize I can subset() X to only include the columns of interest, but I want to understand "named dimnames" better.
Below is some sample code:
> x <- data.frame(cbind(1,1:10))
> apply(x,2,sum)
X1 X2
10 55
> apply(x,c('X2'),sum)
Error in apply(x, c("X2"), sum) : 'X' must have named dimnames
> dimnames(x)
[[1]]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
[[2]]
[1] "X1" "X2"
> names(x)
[1] "X1" "X2"
> names(dimnames(x))
NULL
If I understand you correctly, you would like to use apply only on certain columns. This is not what named dimnames would accomplish. The apply function on a matrix or data.frame always applies to all the rows or all the columns. The named dimnames allows you to choose to use rows or columns by name instead of the "normal" 1 and 2:
m <- matrix(1:12,4, dimnames=list(foo=letters[1:4], bar=LETTERS[1:3]))
apply(m, "bar", sum) # Use "bar" instead of 2 to refer to the columns
However if you have the column names you'd like to apply to, you could do it by first selecting only those columns:
n <- c("A","C")
apply(m[,n], 2, sum)
# A C
#10 42
Named dimnames is a side-effect of that dimnames are stored as a list in the "dimnames" attribute in a matrix or array. Each component of the list corresponds to one dimension and can be named. This is probably more useful for multidimensional arrays...
For a data.frame, there is no "dimnames" attribute. A data.frame is essentially a list, so the list's "names" attributes corresponds to the column names, and an extra "row.names" attribute corresponds to the row names. Because of this, there is no place to store the names of the dimnames (they could have an extra attribute for that of course, but they didn't). When you call the dimnames function on a data.frame, it simply creates a list from the "row.names" and "names" attributes.
The issue is that you can't manipulate the dimnames of x directly for some reason, and x will be coerced to a matrix which isn't preserving named dimnames.
A solution is to coerce to a matrix first, then name the dimnames and then use apply()
> X <- as.matrix(x)
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "1" "2" "3" "4" ...
..$ : chr [1:2] "X1" "X2"
> dimnames(X) <- list(C1 = dimnames(x)[[1]], C2 = dimnames(x)[[2]])
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ C1: chr [1:10] "1" "2" "3" "4" ...
..$ C2: chr [1:2] "X1" "X2"
> apply(X, "C1", mean)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
> rowMeans(X)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Related
I have this initial matrix:
> fil
2 3 6
1 1 1
> str(fil)
Named num [1:3] 1 1 1
- attr(*, "names")= chr [1:3] "2" "3" "6"
When I do this:
which(fil==min(fil,na.rm = TRUE))
I have this returned
> which(fil==min(fil,na.rm = TRUE))
2 3 6
1 2 3
And I wanted the names of the vector to be returned:
2 3 6
When you see an output like the one in the question, you must suspect that the upper vector are the names of the vector printed below them. What is below is the actual vector, its values, not the first line of the output.
This is confirmed with str
str(fil)
# Named num [1:3] 1 1 1
# - attr(*, "names")= chr [1:3] "2" "3" "6"
It starts by saying Named num, so it is a named numeric vector.
Then there is an attributes line. The attribute in question is "names". And there are functions to get some frequent attributes, such as the "names" attribute.
fil <- c('2' = 1, '3' = 1, '6' = 1)
fil
#2 3 6
#1 1 1
attributes(fil)
#$names
#[1] "2" "3" "6"
There are two ways to get the attribute "names". The second is the shorcut I will use:
attr(fil, "names")
#[1] "2" "3" "6"
names(fil)
#[1] "2" "3" "6"
Now, to answer the question, just subset the names that correspond to the minimum of the vector fil.
names(fil)[which(fil==min(fil,na.rm = TRUE))]
#[1] "2" "3" "6"
After a previous post regarding coercion of variables into their appropriate format, I realized that the problem is due to unlist():ing, which appears to kill off the object class of variables.
Consider a nested list (myList) of the following structure
> str(myList)
List of 2
$ lst1:List of 3
..$ var1: chr [1:4] "A" "B" "C" "D"
..$ var2: num [1:4] 1 2 3 4
..$ var3: Date[1:4], format: "1999-01-01" "2000-01-01" "2001-01-01" "2002-01-01"
$ lst2:List of 3
..$ var1: chr [1:4] "Q" "W" "E" "R"
..$ var2: num [1:4] 11 22 33 44
..$ var3: Date[1:4], format: "1999-01-02" "2000-01-03" "2001-01-04" "2002-01-05"
which contains different object types (character, numeric and Date) at the lowest level. I`ve been using
myNewLst <- lapply(myList, function(x) unlist(x,recursive=FALSE))
result <- do.call("rbind", myNewLst)
to get the desired structure of my resulting matrix. However, this yields a coercion into character for all variables, as seen here:
> str(result)
chr [1:2, 1:12] "A" "Q" "B" "W" "C" "E" "D" "R" "1" "11" "2" "22" "3" "33" "4" "44" "10592" "10593" "10957" "10959" "11323" "11326" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "lst1" "lst2"
..$ : chr [1:12] "var11" "var12" "var13" "var14" ...
After reading a post on a similar issue, I've attempted to utilize do.call("c", x)
myNewLst <- lapply(myList, function(x) do.call("c", x))
result <- do.call("rbind", myNewLst)
Unfortunately, this also results in all variables being characters, as my first attempt. So my question is: How do I unlist a nested list without loosing the object class of my lower-level variables? Are there alternatives which will accomplish the desired result?
Reproducible code for myList:
myList <- list(
"lst1" = list(
"var1" = c("A","B","C","D"),
"var2" = c(1,2,3,4),
"var3" = c(as.Date('1999/01/01'),as.Date('2000/01/01'),as.Date('2001/01/01'),as.Date('2002/01/01'))
),
"lst2" = list(
"var1" = c("Q","W","E","R"),
"var2" = c(11,22,33,44),
"var3" = c(as.Date('1999/01/02'),as.Date('2000/01/03'),as.Date('2001/01/4'),as.Date('2002/01/05'))
)
)
You can use Reduce() or do.call() to be able to combine all of the to one dataframe. The code below should work
Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F))
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Also the class is maintained:
mapply(class,Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F)))
var1 var2 var3
"character" "numeric" "Date"
If your goal is to convert this list of lists into a single data frame, the following code should work:
result <- data.frame(var1 = unlist(lapply(myList, function(e) e[1]), use.names = FALSE),
var2 = unlist(lapply(myList, function(e) e[2]), use.names = FALSE),
var3 = as.Date(unlist(lapply(myList, function(e) e[3]), use.names = FALSE), origin = "1970-01-01"))
This gives:
> result
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Of course, you could use a for-loop to make the code more succinct if there are multiple variables in each list.
I have a dataframe. I want to inspect the class of each column.
x1 = rep(1:4, times=5)
x2 = factor(rep(letters[1:4], times=5))
xdat = data.frame(x1, x2)
> class(xdat)
[1] "data.frame"
> class(xdat$x1)
[1] "integer"
> class(xdat$x2)
[1] "factor"
However, imagine that I have many columns and therefore need to use apply() to help me do the trick. But it's not working.
apply(xdat, 2, class)
x1 x2
"character" "character"
Why cannot I use apply() to see the data type of each column? or What I should do?
Thanks!
You could use
sapply(xdat, class)
# x1 x2
# "integer" "factor"
using apply would coerce the output to matrix and matrix can hold only a single 'class'. If there are 'character' columns, the result would be a single 'character' class. To understand this check
str(apply(xdat, 2, I))
#chr [1:20, 1:2] "1" "2" "3" "4" "1" "2" "3" "4" "1" ...
#- attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "x1" "x2"
Now, if we check
str(lapply(xdat, I))
#List of 2
#$ x1:Class 'AsIs' int [1:20] 1 2 3 4 1 2 3 4 1 2 ...
#$ x2: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 1 2 3 4 1 2 ...
I have a data.frame
'data.frame': 4 obs. of 2 variables:
$ name:List of 4
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
..$ : chr "d"
$ tvd :List of 4
..$ : num 0.149
..$ : num 0.188
..$ : num 0.161
..$ : num 0.187
structure(list(name = list("a", "b", "c",
"d"), tvd = list(0.148831029536996, 0.187699857380692,
0.161428147003292, 0.18652668961466)), .Names = c("name",
"tvd"), row.names = c(NA, -4L), class = "data.frame")
It appears that as.data.frame(lapply(z,unlist)) converts it to the usual
'data.frame': 4 obs. of 2 variables:
$ name: Factor w/ 4 levels "a",..: 4 1 2 3
$ tvd : num 0.149 0.188 0.161 0.187
However, I wonder if I could do better.
I create my ugly data frame like this:
as.data.frame(do.call(rbind,lapply(my.list, function (m)
list(name = ...,
tvd = ...))))
I wonder if it is possible to modify this expressing so that it would produce the normal data table.
It looks like you're just trying to tear down your original data then re-assemble it? If so, here are a few cool things to look at. Assume df is your data.
A data.frame is just a list in disguise. To see this, compare df[[1]] to df$name in your data. [[ is used for list indexing, as well as $. So we are actually viewing a list item when we use df$name on a data frame.
> is.data.frame(df) # df is a data frame
# [1] TRUE
> is.list(df) # and it's also a list
# [1] TRUE
> x <- as.list(df) # as.list() can be more useful than unlist() sometimes
# take a look at x here, it's a bit long
> (y <- do.call(cbind, x)) # reassemble to matrix form
# name tvd
# [1,] "a" 0.148831
# [2,] "b" 0.1876999
# [3,] "c" 0.1614281
# [4,] "d" 0.1865267
> as.data.frame(y) # back to df
# name tvd
# 1 a 0.148831
# 2 b 0.1876999
# 3 c 0.1614281
# 4 d 0.1865267
I recommend doing
do.call(rbind,lapply(my.list, function (m)
data.frame(name = ...,
tvd = ...)))
rather than trying to convert a list of lists into a data.frame
I've got a list with different types in it. They are arranged in matrix form:
tmp <- list('a', 1, 'b', 2, 'c', 3)
dim(tmp) <- c(2,3)
tmp
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] 1 2 3
That's the form I get it out of another more complex function.
Now I want to transpose it and convert to a data.frame. So I do the following:
data <- as.data.frame(t(tmp))
data
V1 V2
1 a 1
2 b 2
3 c 3
This looks great. But it's got the wrong structure:
str(data)
'data.frame': 3 obs. of 2 variables:
$ V1:List of 3
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
$ V2:List of 3
..$ : num 1
..$ : num 2
..$ : num 3
So how do I get rid of the extra level of lists?
This should do the trick:
df <- data.frame(lapply(data.frame(t(tmp)), unlist), stringsAsFactors=FALSE)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ X1: chr "a" "b" "c"
# $ X2: num 1 2 3
The inner data.frame() call converts the matrix into a two column data.frame, with one "character" column and one "numeric" column.**
lapply(..., unlist) strips away extra list() layer.
The outer data.frame() call converts the resulting list into the data.frame you're after.
** (OK, that intermediate "character" column is really of class "factor", but it ends up making no difference in the final result. If you like, you could force it to be have class "character" by adding a stringsAsFactors=FALSE for the inner data.frame() call as well, but I don't think neglecting to do so would ever make a difference...)
Or this :
as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE))
You can inspect the result:
str(as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE)))
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3