Extracting columns from a data frame in R [duplicate]

Extracting columns from a data frame in R [duplicate] - r

Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same

Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"

dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.

Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"

Related

Function of unlist() when turning one row of a dataframe to a matrix

What is the difference between matrix(unlist(DF[1,])) and matrix(DF[1,]) where DF is my dataframe. How does unlist() help here?

DF[1,] will extract the first row of the data.frame. This row is still a data.frame, a type of list. unlist() will convert it to a vector that can be made into a matrix. If you don't use unlist, the you can still make a matrix, but it is a matrix of the elements of the list, rather than of the elements of a vector. For example,
> cars[1,]
speed dist
1 4 2
> a <- matrix(cars[1,])
> b <- matrix(unlist(cars[1,]))
> a[,1]
[[1]]
[1] 4
[[2]]
[1] 2
> b[,1]
[1] 4 2

Prevent [.data.frame drop dimensions where there is only one column

I have a data frame demos, with n columns (depends on external input), where n = 1,2,3 ...
I want to delete certain rows, then add new columns to this data frame. When n > 1, the following code works fine, where demos.part is always an R data.frame.
demos.part <- demos[-i, ] // remove i-th row
demos.part[,"new column name"] <- as.vector(<new data>)
However when n == 1, the demos.part in the first line becomes an vector. Then the second line does not work anymore.
Of course we can hard code to fix the special case. Is there a consistent (elegant) way to remove rows from data.frame and still return a data.frame, even if the data frame has only one column?

Your first line, demos.part <- demos[-i, ], would only drop from a data frame to a matrix if demis.part has exactly one column:
# One column: result is a vector
> data.frame(a=letters)[1,]
[1] a
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
# 2 cols: result is a df with 1 row
> data.frame(a=letters, b=letters)[1,]
data.frame with 1 row and 2 columns
a b
<factor> <factor>
1 a a
To see why this is, you can inspect the arguments of [.data.frame, where the default value of the drop argument depends on the number of columns:
> args(`[.data.frame`)
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
1)
NULL
Regardless, any time you want to prevent dropping of dimensions, simply add drop=FALSE after any indexing arguments (including intentionally blank indexing arguments; note the empty space between the two commas for the blank column index):
> data.frame(a=letters)[1, , drop=FALSE]
data.frame with 1 row and 1 column
a
<factor>
1 a
You should always use drop=FALSE when deciding how many rows/columns to select based on external input, since there is always the possibility that it will select just one row. Alternatively, use the data_frame function from the dplyr package to create a data frame with fewer weird edge cases in its behavior:
> library(dplyr)
> data_frame(a=letters)[1,]
Source: local data frame [1 x 1]
a
(chr)
1 a

Responding to your command about the colnames - i don't think they disappear.
Consider following code:
remove.row <- function(df,n) { as.data.frame(df[-n,]) }
#
a <- data.frame(col1=c(1,2),col2=c("A","B"))
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
produces:
> a
col1 col2
1 1 A
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
col1 col2
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
[1] col1 col2
<0 rows> (or 0-length row.names)
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"

Index a Particular Numeric Vector From a List of Vectors in R

In R, for the sake of example, I have a list composed of equal-length numeric vectors of form similar to:
list <- list(c(1,2,3),c(1,3,2),c(2,1,3))
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 2 1 3
...
Every element of the list is unique. I want to get the index number of the element x <- c(2,1,3), or any other particular numeric vector within the list.
I've attempted using match(x,list), which gives a vector full of NA, and which(list==(c(1,2,3)), which gives me a "(list) object cannot be coerced to type 'double'" error. Coercing the list to different types didn't seem to make a difference for the which function. I also attempted various grep* functions, but these don't return exact numeric vector matches. Using find(c(1,2,3),list) or even some fancy sapply which %in% type functions didn't give me what I was looking for. I feel like I have a type problem. Any suggestions?
--Update--
Summary of Solutions
Thanks for your replies. The method in the comment for this question is clean and works well (via akrun).
> which(paste(list)==deparse(x))
[1] 25
The next method didn't work correctly
> which(duplicated(c(x, list(y), fromLast = TRUE)))
[1] 49
> y
[1] 1 2 3
This sounds good, but in the next block you can see the problem
> y<-c(1,3,2)
> which(duplicated(c(list, list(y), fromLast = TRUE)))
[1] 49
More fundamentally, there are only 48 elements in the list I was using.
The last method works well (via BondedDust), and I would guess it is more efficient using an apply function:
> which( sapply(list, identical, y ))
[1] 25

match works fine if you pass it the right data.
L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
match(list(c(2,1,3)), L)
#[1] 3
Beware that this works via coercing lists to character, so fringe cases will fail - with a hat-tip to #nicola:
match(list(1:3),L)
#[1] NA
even though:
1:3 == c(1,2,3)
#[1] TRUE TRUE TRUE
Although arguably:
identical(1:3,c(1,2,3))
#[1] FALSE
identical(1:3,c(1L,2L,3L))
#[1] TRUE

You can use duplicated(). If we add the matching vector to the end of the original list and set fromLast = TRUE, we will find the duplicate(s). Then we can use which() to get the index.
which(duplicated(c(list, list(c(2, 1, 3)), fromLast = TRUE))
# [1] 3
Or you could add it as the first element and subtract 1 from the result.
which(duplicated(c(list(c(2, 1, 3)), list))) - 1L
# [1] 3
Note that the type always matters with this type of comparison. When comparing integers and numerics, you will need to convert doubles to integers for this to run without issue. For example, 1:3 is not the same type as c(1, 2, 3).

> L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
> sapply(L, identical, c(2,1,3))
[1] FALSE FALSE TRUE
> which( sapply(L, identical, c(2,1,3)) )
[1] 3
This would be slightly less restrictive in its test:
> which( sapply(L, function(x,y){all(x==y)}, c(1:3)) )
[1] 1

Try:
vapply(list,function(z) all(z==x),TRUE)
#[1] FALSE FALSE TRUE
Enclosing the above line to which gives you the index of the list.

Why is.vector on a data-frame doesn't return TRUE?

tl;dr - What the hell is a vector in R?
Long version:
Lots of stuff is a vector in R. For instance, a number is a numeric vector of length 1:
is.vector(1)
[1] TRUE
A list is also a vector.
is.vector(list(1))
[1] TRUE
OK, so a list is a vector. And a data frame is a list, apparently.
is.list(data.frame(x=1))
[1] TRUE
But, (seemingly violating the transitive property), a data frame is not a vector, even though a dataframe is a list, and a list is a vector. EDIT: It is a vector, it just has additional attributes, which leads to this behavior. See accepted answer below.
is.vector(data.frame(x=1))
[1] FALSE
How can this be?

To answer your question another way, the R Internals manual lists R's eight built-in vector types: "logical", "numeric", "character", "list", "complex", "raw", "integer", and "expression".
To test whether the non-attribute part of an object is really one of those vector types "underneath it all", you can examine the results of is(), like this:
isVector <- function(X) "vector" %in% is(X)
df <- data.frame(a=1:4)
isVector(df)
# [1] TRUE
# Use isVector() to examine a number of other vector and non-vector objects
la <- structure(list(1:4), mycomment="nothing")
chr <- "word" ## STRSXP
lst <- list(1:4) ## VECSXP
exp <- expression(rnorm(99)) ## EXPRSXP
rw <- raw(44) ## RAWSXP
nm <- as.name("x") ## LANGSXP
pl <- pairlist(b=5:8) ## LISTSXP
sapply(list(df, la, chr, lst, exp, rw, nm, pl), isVector)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE

Illustrating what #joran pointed out, that is.vector returns false on a vector which has any attributes other than names (I never knew that) ...
# 1) Example of when a vector stops being a vector...
> dubious = 7:11
> attributes(dubious)
NULL
> is.vector(dubious)
[1] TRUE
#now assign some additional attributes
> attributes(dubious) <- list(a = 1:5)
> attributes(dubious)
$a
[1] 1 2 3 4 5
> is.vector(dubious)
[1] FALSE
# 2) Example of how to strip a dataframe of attributes so it looks like a true vector ...
> df = data.frame()
> attributes(df)
$names
character(0)
$row.names
integer(0)
$class
[1] "data.frame"
> attributes(df)[['row.names']] <- NULL
> attributes(df)[['class']] <- NULL
> attributes(df)
$names
character(0)
> is.vector(df)
[1] TRUE

Not an answer, but here are some other interesting things that are definitely worth investigating. Some of this has to do with the way objects are stored in R.
One example:
If we set up a matrix of one element, that element being a list, we get the following. Even though it's a list, it can be stored in one element of the matrix.
> x <- matrix(list(1:5)) # we already know that list is also a vector
> x
# [,1]
# [1,] Integer,5
Now if we coerce x to a data frame, it's dimensions are still (1, 1)
> y <- as.data.frame(x)
> dim(y)
# [1] 1 1
Now, if we look at the first element of y, it's the data frame column,
> y[1]
# V1
# 1 1, 2, 3, 4, 5
But if we look at the first column of, y, it's a list
> y[,1]
# [[1]]
# [1] 1 2 3 4 5
which is exactly the same as the first row of y.
> y[1,]
# [[1]]
# [1] 1 2 3 4 5
There are a lot of properties about R objects that are cool to investigate if you have the time.

Are dataframe[ ,-1] and dataframe[-1] the same?

Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same

Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"

dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.

Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extracting columns from a data frame in R [duplicate] - r

Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types? And why are they the same

dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.

Related

Function of unlist() when turning one row of a dataframe to a matrix

Prevent [.data.frame drop dimensions where there is only one column

Index a Particular Numeric Vector From a List of Vectors in R

Why is.vector on a data-frame doesn't return TRUE?

Are dataframe[ ,-1] and dataframe[-1] the same?

Categories

Resources