Row names & column names in R - r

Do the following function pairs generate exactly the same results?
Pair 1) names() & colnames()
Pair 2) rownames() & row.names()

As Oscar Wilde said
Consistency is the last refuge of the
unimaginative.
R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:
R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>

Just to expand a little on Dirk's example:
It helps to think of a data frame as a list with equal length vectors. That's probably why names works with a data frame but not a matrix.
The other useful function is dimnames which returns the names for every dimension. You will notice that the rownames function actually just returns the first element from dimnames.
Regarding rownames and row.names: I can't tell the difference, although rownames uses dimnames while row.names was written outside of R. They both also seem to work with higher dimensional arrays:
>a <- array(1:5, 1:4)
> a[1,,,]
> rownames(a) <- "a"
> row.names(a)
[1] "a"
> a
, , 1, 1
[,1] [,2]
a 1 2
> dimnames(a)
[[1]]
[1] "a"
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL

I think that using colnames and rownames makes the most sense; here's why.
Using names has several disadvantages. You have to remember that it means "column names", and it only works with data frame, so you'll need to call colnames whenever you use matrices. By calling colnames, you only have to remember one function. Finally, if you look at the code for colnames, you will see that it calls names in the case of a data frame anyway, so the output is identical.
rownames and row.names return the same values for data frame and matrices; the only difference that I have spotted is that where there aren't any names, rownames will print "NULL" (as does colnames), but row.names returns it invisibly. Since there isn't much to choose between the two functions, rownames wins on the grounds of aesthetics, since it pairs more prettily withcolnames. (Also, for the lazy programmer, you save a character of typing.)

And another expansion:
# create dummy matrix
set.seed(10)
m <- matrix(round(runif(25, 1, 5)), 5)
d <- as.data.frame(m)
If you want to assign new column names you can do following on data.frame:
# an identical effect can be achieved with colnames()
names(d) <- LETTERS[1:5]
> d
A B C D E
1 3 2 4 3 4
2 2 2 3 1 3
3 3 2 1 2 4
4 4 3 3 3 2
5 1 3 2 4 3
If you, however run previous command on matrix, you'll mess things up:
names(m) <- LETTERS[1:5]
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 3 2 4 3 4
[2,] 2 2 3 1 3
[3,] 3 2 1 2 4
[4,] 4 3 3 3 2
[5,] 1 3 2 4 3
attr(,"names")
[1] "A" "B" "C" "D" "E" NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[20] NA NA NA NA NA NA
Since matrix can be regarded as two-dimensional vector, you'll assign names only to first five values (you don't want to do that, do you?). In this case, you should stick with colnames().
So there...

Related

cbind multiple dataframes, that are stored in a list [duplicate]

I would like to find a way to create a data.frame by using cbind() to join together many separate objects. For example, if A, B, C & D are all vectors of equal length, one can create data.frame ABCD with
ABCD <- cbind(A,B,C,D)
However, when the number of objects to be combined gets large, it becomes tedious to type out all of their names. Furthermore, Is there a way to call cbind() on a vector of object names, e.g.
objs <- c("A", "B", "C", "D")
ABCD <- cbind(objs)
or on a list containing all the objects to be combined, e.g.
obj.list <- list(A,B,C,D)
ABCD <- cbind(obj.list)
Currently, the only workaround I can think of is to use paste(), cat(), write.table(), and source() to construct the arguments to cbind(), write it as a script and source it. This seems like a very nasty kludge. Also, I have looked into do.call() but can't seem to find a way to accomplish what I want with it.
The do.call function is very useful here:
A <- 1:10
B <- 11:20
C <- 20:11
> do.call(cbind, list(A,B,C))
[,1] [,2] [,3]
[1,] 1 11 20
[2,] 2 12 19
[3,] 3 13 18
[4,] 4 14 17
[5,] 5 15 16
[6,] 6 16 15
[7,] 7 17 14
[8,] 8 18 13
[9,] 9 19 12
[10,] 10 20 11
First you need to get the objects you want and store them together as a list; if you can construct their names as strings, you use the get function. Here I create two variables, A and B:
> A <- 1:4
> B <- rep(LETTERS[1:2],2)
I then construct a character vector containing their names (stored as ns) and get these variables using lapply. I then set the names of the list to be the same as their original names.
> (ns <- LETTERS[1:2])
[1] "A" "B"
> obj.list <- lapply(ns, get)
> names(obj.list) <- ns
> obj.list
$A
[1] 1 2 3 4
$B
[1] "A" "B" "A" "B"
Then you can use do.call; the first argument is the function you want and the second is a list with the arguments you want to pass to it.
> do.call(cbind, obj.list)
A B
[1,] "1" "A"
[2,] "2" "B"
[3,] "3" "A"
[4,] "4" "B"
However, as aL3xa correctly notes, this makes a matrix, not a data frame, which may not be what you want if the variables are different classes; here my A has been coerced to a character vector instead of a numeric vector. To make a data frame from a list, you just call data.frame on it; then the classes of the variables are retained.
> (AB <- data.frame(obj.list))
A B
1 1 A
2 2 B
3 3 A
4 4 B
> sapply(AB, class)
A B
"integer" "factor"
> str(AB)
'data.frame': 4 obs. of 2 variables:
$ A: int 1 2 3 4
$ B: Factor w/ 2 levels "A","B": 1 2 1 2
You should bare in mind, though, that cbind will return an atomic vector (matrix) when applied solely on atomic vectors (double in this case). As you can see in #prasad's and #Aaron's answers, resulting object is a matrix. If you specify other atomic vectors (integer, double, logical, complex) along with character vector, they will get coerced to character. And then you have a problem - you have to convert them to desired classes. So,
if A, B, C & D are all vectors of
equal length, one can create
data.frame ABCD with
ABCD <- data.frame(A, B, C, D)
Perhaps you should ask "how can I easily gather various vectors of equal length and put them in a data.frame"? cbind is great, but sometimes it's not what you're looking for...
You can put all vectors in environment into list using eapply.
obj.list <- eapply(.GlobalEnv,function(x) if(is.vector(x)) x)
obj.list <- obj.list[names(obj.list) %in% LETTERS]

I have a list of matrices with equal number of rows. I want to cbind the matrices [duplicate]

I would like to find a way to create a data.frame by using cbind() to join together many separate objects. For example, if A, B, C & D are all vectors of equal length, one can create data.frame ABCD with
ABCD <- cbind(A,B,C,D)
However, when the number of objects to be combined gets large, it becomes tedious to type out all of their names. Furthermore, Is there a way to call cbind() on a vector of object names, e.g.
objs <- c("A", "B", "C", "D")
ABCD <- cbind(objs)
or on a list containing all the objects to be combined, e.g.
obj.list <- list(A,B,C,D)
ABCD <- cbind(obj.list)
Currently, the only workaround I can think of is to use paste(), cat(), write.table(), and source() to construct the arguments to cbind(), write it as a script and source it. This seems like a very nasty kludge. Also, I have looked into do.call() but can't seem to find a way to accomplish what I want with it.
The do.call function is very useful here:
A <- 1:10
B <- 11:20
C <- 20:11
> do.call(cbind, list(A,B,C))
[,1] [,2] [,3]
[1,] 1 11 20
[2,] 2 12 19
[3,] 3 13 18
[4,] 4 14 17
[5,] 5 15 16
[6,] 6 16 15
[7,] 7 17 14
[8,] 8 18 13
[9,] 9 19 12
[10,] 10 20 11
First you need to get the objects you want and store them together as a list; if you can construct their names as strings, you use the get function. Here I create two variables, A and B:
> A <- 1:4
> B <- rep(LETTERS[1:2],2)
I then construct a character vector containing their names (stored as ns) and get these variables using lapply. I then set the names of the list to be the same as their original names.
> (ns <- LETTERS[1:2])
[1] "A" "B"
> obj.list <- lapply(ns, get)
> names(obj.list) <- ns
> obj.list
$A
[1] 1 2 3 4
$B
[1] "A" "B" "A" "B"
Then you can use do.call; the first argument is the function you want and the second is a list with the arguments you want to pass to it.
> do.call(cbind, obj.list)
A B
[1,] "1" "A"
[2,] "2" "B"
[3,] "3" "A"
[4,] "4" "B"
However, as aL3xa correctly notes, this makes a matrix, not a data frame, which may not be what you want if the variables are different classes; here my A has been coerced to a character vector instead of a numeric vector. To make a data frame from a list, you just call data.frame on it; then the classes of the variables are retained.
> (AB <- data.frame(obj.list))
A B
1 1 A
2 2 B
3 3 A
4 4 B
> sapply(AB, class)
A B
"integer" "factor"
> str(AB)
'data.frame': 4 obs. of 2 variables:
$ A: int 1 2 3 4
$ B: Factor w/ 2 levels "A","B": 1 2 1 2
You should bare in mind, though, that cbind will return an atomic vector (matrix) when applied solely on atomic vectors (double in this case). As you can see in #prasad's and #Aaron's answers, resulting object is a matrix. If you specify other atomic vectors (integer, double, logical, complex) along with character vector, they will get coerced to character. And then you have a problem - you have to convert them to desired classes. So,
if A, B, C & D are all vectors of
equal length, one can create
data.frame ABCD with
ABCD <- data.frame(A, B, C, D)
Perhaps you should ask "how can I easily gather various vectors of equal length and put them in a data.frame"? cbind is great, but sometimes it's not what you're looking for...
You can put all vectors in environment into list using eapply.
obj.list <- eapply(.GlobalEnv,function(x) if(is.vector(x)) x)
obj.list <- obj.list[names(obj.list) %in% LETTERS]

How to keep the names of the data frame variables while unlisting? [duplicate]

I would like to find a way to create a data.frame by using cbind() to join together many separate objects. For example, if A, B, C & D are all vectors of equal length, one can create data.frame ABCD with
ABCD <- cbind(A,B,C,D)
However, when the number of objects to be combined gets large, it becomes tedious to type out all of their names. Furthermore, Is there a way to call cbind() on a vector of object names, e.g.
objs <- c("A", "B", "C", "D")
ABCD <- cbind(objs)
or on a list containing all the objects to be combined, e.g.
obj.list <- list(A,B,C,D)
ABCD <- cbind(obj.list)
Currently, the only workaround I can think of is to use paste(), cat(), write.table(), and source() to construct the arguments to cbind(), write it as a script and source it. This seems like a very nasty kludge. Also, I have looked into do.call() but can't seem to find a way to accomplish what I want with it.
The do.call function is very useful here:
A <- 1:10
B <- 11:20
C <- 20:11
> do.call(cbind, list(A,B,C))
[,1] [,2] [,3]
[1,] 1 11 20
[2,] 2 12 19
[3,] 3 13 18
[4,] 4 14 17
[5,] 5 15 16
[6,] 6 16 15
[7,] 7 17 14
[8,] 8 18 13
[9,] 9 19 12
[10,] 10 20 11
First you need to get the objects you want and store them together as a list; if you can construct their names as strings, you use the get function. Here I create two variables, A and B:
> A <- 1:4
> B <- rep(LETTERS[1:2],2)
I then construct a character vector containing their names (stored as ns) and get these variables using lapply. I then set the names of the list to be the same as their original names.
> (ns <- LETTERS[1:2])
[1] "A" "B"
> obj.list <- lapply(ns, get)
> names(obj.list) <- ns
> obj.list
$A
[1] 1 2 3 4
$B
[1] "A" "B" "A" "B"
Then you can use do.call; the first argument is the function you want and the second is a list with the arguments you want to pass to it.
> do.call(cbind, obj.list)
A B
[1,] "1" "A"
[2,] "2" "B"
[3,] "3" "A"
[4,] "4" "B"
However, as aL3xa correctly notes, this makes a matrix, not a data frame, which may not be what you want if the variables are different classes; here my A has been coerced to a character vector instead of a numeric vector. To make a data frame from a list, you just call data.frame on it; then the classes of the variables are retained.
> (AB <- data.frame(obj.list))
A B
1 1 A
2 2 B
3 3 A
4 4 B
> sapply(AB, class)
A B
"integer" "factor"
> str(AB)
'data.frame': 4 obs. of 2 variables:
$ A: int 1 2 3 4
$ B: Factor w/ 2 levels "A","B": 1 2 1 2
You should bare in mind, though, that cbind will return an atomic vector (matrix) when applied solely on atomic vectors (double in this case). As you can see in #prasad's and #Aaron's answers, resulting object is a matrix. If you specify other atomic vectors (integer, double, logical, complex) along with character vector, they will get coerced to character. And then you have a problem - you have to convert them to desired classes. So,
if A, B, C & D are all vectors of
equal length, one can create
data.frame ABCD with
ABCD <- data.frame(A, B, C, D)
Perhaps you should ask "how can I easily gather various vectors of equal length and put them in a data.frame"? cbind is great, but sometimes it's not what you're looking for...
You can put all vectors in environment into list using eapply.
obj.list <- eapply(.GlobalEnv,function(x) if(is.vector(x)) x)
obj.list <- obj.list[names(obj.list) %in% LETTERS]

returning matrix column indices matching value(s) in R

I'm looking for a fast way to return the indices of columns of a matrix that match values provided in a vector (ideally of length 1 or the same as the number of rows in the matrix)
for instance:
mat <- matrix(1:100,10)
values <- c(11,2,23,12,35,6,97,3,9,10)
the desired function, which I call rowMatches() would return:
rowMatches(mat, values)
[1] 2 1 3 NA 4 1 10 NA 1 1
Indeed, value 11 is first found at the 2nd column of the first row, value 2 appears at the 1st column of the 2nd row, value 23 is at the 3rd column of the 3rd row, value 12 is not in the 4th row... and so on.
Since I haven't found any solution in package matrixStats, I came up with this function:
rowMatches <- function(mat,values) {
res <- integer(nrow(mat))
matches <- mat == values
for (col in ncol(mat):1) {
res[matches[,col]] <- col
}
res[res==0] <- NA
res
}
For my intended use, there will be millions of rows and few columns. So splitting the matrix into rows (in a list called, say, rows) and calling Map(match, as.list(values), rows) would be way too slow.
But I'm not satisfied by my function because there is a loop, which may be slow if there are many columns. It should be possible to use apply() on columns, but it won't make it faster.
Any ideas?
res <- arrayInd(match(values, mat), .dim = dim(mat))
res[res[, 1] != seq_len(nrow(res)), 2] <- NA
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
# [3,] 3 3
# [4,] 2 NA
# [5,] 5 4
# [6,] 6 1
# [7,] 7 10
# [8,] 3 NA
# [9,] 9 1
#[10,] 10 1
Roland's answer is good, but I'll post an alternative solution:
res <- which(mat==values, arr.ind = T)
res <- res[match(seq_len(nrow(mat)), res[,1]), 2]

Getting a row from a data frame as a vector in R

I know that to get a row from a data frame in R, we can do this:
data[row,]
where row is an integer. But that spits out an ugly looking data structure where every column is labeled with the names of the column names. How can I just get it a row as a list of value?
Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE
In this case to extract a row or a column as a vector you need to do something like this:
as.numeric(as.vector(DF[1,]))
or like this
as.character(as.vector(DF[1,]))
You can't necessarily get it as a vector because each column might have a different mode. You might have numerics in one column and characters in the next.
If you know the mode of the whole row, or can convert to the same type, you can use the mode's conversion function (for example, as.numeric()) to convert to a vector. For example:
> state.x77[1,]
Population Income Illiteracy Life Exp Murder HS Grad Frost
3615.00 3624.00 2.10 69.05 15.10 41.30 20.00
Area
50708.00
> as.numeric(state.x77[1,])
[1] 3615.00 3624.00 2.10 69.05 15.10 41.30 20.00 50708.00
This would work even if some of the columns were integers, although they would be converted to numeric floating-point numbers.
There is a problem with what you propose; namely that the components of data frames (what you call columns) can be of different data types. If you want a single row as a vector, that must contain only a single data type - they are atomic vectors!
Here is an example:
> set.seed(2)
> dat <- data.frame(A = 1:10, B = sample(LETTERS[1:4], 10, replace = TRUE))
> dat
A B
1 1 A
2 2 C
3 3 C
4 4 A
5 5 D
6 6 D
7 7 A
8 8 D
9 9 B
10 10 C
> dat[1, ]
A B
1 1 A
If we force it to drop the empty (column), the only recourse for R is to convert the row to a list to maintain the disparate data types.
> dat[1, , drop = TRUE]
$A
[1] 1
$B
[1] A
Levels: A B C D
The only logical solution to this it to get the data frame into a common type by coercing it to a matrix. This is done via data.matrix() for example:
> mat <- data.matrix(dat)
> mat[1,]
A B
1 1
data.matrix() converts factors to their internal numeric codes. The above allows the first row to be extracted as a vector.
However, if you have character data in the data frame, the only recourse will be to create a character matrix, which may or may not be useful, and data.matrix() now can't be used, we need as.matrix() instead:
> dat$String <- LETTERS[1:10]
> str(dat)
'data.frame': 10 obs. of 3 variables:
$ A : int 1 2 3 4 5 6 7 8 9 10
$ B : Factor w/ 4 levels "A","B","C","D": 1 3 3 1 4 4 1 4 2 3
$ String: chr "A" "B" "C" "D" ...
> mat <- data.matrix(dat)
Warning message:
NAs introduced by coercion
> mat
A B String
[1,] 1 1 NA
[2,] 2 3 NA
[3,] 3 3 NA
[4,] 4 1 NA
[5,] 5 4 NA
[6,] 6 4 NA
[7,] 7 1 NA
[8,] 8 4 NA
[9,] 9 2 NA
[10,] 10 3 NA
> mat <- as.matrix(dat)
> mat
A B String
[1,] " 1" "A" "A"
[2,] " 2" "C" "B"
[3,] " 3" "C" "C"
[4,] " 4" "A" "D"
[5,] " 5" "D" "E"
[6,] " 6" "D" "F"
[7,] " 7" "A" "G"
[8,] " 8" "D" "H"
[9,] " 9" "B" "I"
[10,] "10" "C" "J"
> mat[1, ]
A B String
" 1" "A" "A"
> class(mat[1, ])
[1] "character"
How about this?
library(tidyverse)
dat <- as_tibble(iris)
pulled_row <- dat %>% slice(3) %>% flatten_chr()
If you know all the values are same type, then use flatten_xxx.
Otherwise, I think flatten_chr() is safer.
As user "Reinstate Monica" notes, this problem has two parts:
A data frame will often have different data types in each column that need to be coerced to character strings.
Even after coercing the columns to character format, the data.frame "shell" needs to stripped-off to create a vector via a command like unlist.
With a combination of dplyr and base R this can be done in two lines. First, mutate_all converts all columns to character format. Second, the unlist commands extracts the vector out of the data.frame structure.
My particular issue was that the second line of a csv included the actual column names. So, I wanted to extract the second row to a vector and use that to assign column names. The following worked to extract the row as a character vector:
library(dplyr)
data_col_names <- data[2, ] %>%
mutate_all(as.character) %>%
unlist(., use.names=FALSE)
# example of using extracted row to rename cols
names(data) <- data_col_names
# only for this example, you'd want to remove row 2
# data <- data[-2, ]
(Note: Using as.character() in place of unlist will work too but it's less intuitive to apply as.character twice.)
I see that the most short variant is
c(t(data[row,]))
However if at least one column in data is a column of strings, so it will return string vector.

Resources