Are dataframe[ ,-1] and dataframe[-1] the same? - r

Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same

Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"

dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.

Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"

Related

Function of unlist() when turning one row of a dataframe to a matrix

What is the difference between matrix(unlist(DF[1,])) and matrix(DF[1,]) where DF is my dataframe. How does unlist() help here?
DF[1,] will extract the first row of the data.frame. This row is still a data.frame, a type of list. unlist() will convert it to a vector that can be made into a matrix. If you don't use unlist, the you can still make a matrix, but it is a matrix of the elements of the list, rather than of the elements of a vector. For example,
> cars[1,]
speed dist
1 4 2
> a <- matrix(cars[1,])
> b <- matrix(unlist(cars[1,]))
> a[,1]
[[1]]
[1] 4
[[2]]
[1] 2
> b[,1]
[1] 4 2

Prevent [.data.frame drop dimensions where there is only one column

I have a data frame demos, with n columns (depends on external input), where n = 1,2,3 ...
I want to delete certain rows, then add new columns to this data frame. When n > 1, the following code works fine, where demos.part is always an R data.frame.
demos.part <- demos[-i, ] // remove i-th row
demos.part[,"new column name"] <- as.vector(<new data>)
However when n == 1, the demos.part in the first line becomes an vector. Then the second line does not work anymore.
Of course we can hard code to fix the special case. Is there a consistent (elegant) way to remove rows from data.frame and still return a data.frame, even if the data frame has only one column?
Your first line, demos.part <- demos[-i, ], would only drop from a data frame to a matrix if demis.part has exactly one column:
# One column: result is a vector
> data.frame(a=letters)[1,]
[1] a
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
# 2 cols: result is a df with 1 row
> data.frame(a=letters, b=letters)[1,]
data.frame with 1 row and 2 columns
a b
<factor> <factor>
1 a a
To see why this is, you can inspect the arguments of [.data.frame, where the default value of the drop argument depends on the number of columns:
> args(`[.data.frame`)
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
1)
NULL
Regardless, any time you want to prevent dropping of dimensions, simply add drop=FALSE after any indexing arguments (including intentionally blank indexing arguments; note the empty space between the two commas for the blank column index):
> data.frame(a=letters)[1, , drop=FALSE]
data.frame with 1 row and 1 column
a
<factor>
1 a
You should always use drop=FALSE when deciding how many rows/columns to select based on external input, since there is always the possibility that it will select just one row. Alternatively, use the data_frame function from the dplyr package to create a data frame with fewer weird edge cases in its behavior:
> library(dplyr)
> data_frame(a=letters)[1,]
Source: local data frame [1 x 1]
a
(chr)
1 a
Responding to your command about the colnames - i don't think they disappear.
Consider following code:
remove.row <- function(df,n) { as.data.frame(df[-n,]) }
#
a <- data.frame(col1=c(1,2),col2=c("A","B"))
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
produces:
> a
col1 col2
1 1 A
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
col1 col2
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
[1] col1 col2
<0 rows> (or 0-length row.names)
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"

Why is.vector on a data-frame doesn't return TRUE?

tl;dr - What the hell is a vector in R?
Long version:
Lots of stuff is a vector in R. For instance, a number is a numeric vector of length 1:
is.vector(1)
[1] TRUE
A list is also a vector.
is.vector(list(1))
[1] TRUE
OK, so a list is a vector. And a data frame is a list, apparently.
is.list(data.frame(x=1))
[1] TRUE
But, (seemingly violating the transitive property), a data frame is not a vector, even though a dataframe is a list, and a list is a vector. EDIT: It is a vector, it just has additional attributes, which leads to this behavior. See accepted answer below.
is.vector(data.frame(x=1))
[1] FALSE
How can this be?
To answer your question another way, the R Internals manual lists R's eight built-in vector types: "logical", "numeric", "character", "list", "complex", "raw", "integer", and "expression".
To test whether the non-attribute part of an object is really one of those vector types "underneath it all", you can examine the results of is(), like this:
isVector <- function(X) "vector" %in% is(X)
df <- data.frame(a=1:4)
isVector(df)
# [1] TRUE
# Use isVector() to examine a number of other vector and non-vector objects
la <- structure(list(1:4), mycomment="nothing")
chr <- "word" ## STRSXP
lst <- list(1:4) ## VECSXP
exp <- expression(rnorm(99)) ## EXPRSXP
rw <- raw(44) ## RAWSXP
nm <- as.name("x") ## LANGSXP
pl <- pairlist(b=5:8) ## LISTSXP
sapply(list(df, la, chr, lst, exp, rw, nm, pl), isVector)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Illustrating what #joran pointed out, that is.vector returns false on a vector which has any attributes other than names (I never knew that) ...
# 1) Example of when a vector stops being a vector...
> dubious = 7:11
> attributes(dubious)
NULL
> is.vector(dubious)
[1] TRUE
#now assign some additional attributes
> attributes(dubious) <- list(a = 1:5)
> attributes(dubious)
$a
[1] 1 2 3 4 5
> is.vector(dubious)
[1] FALSE
# 2) Example of how to strip a dataframe of attributes so it looks like a true vector ...
> df = data.frame()
> attributes(df)
$names
character(0)
$row.names
integer(0)
$class
[1] "data.frame"
> attributes(df)[['row.names']] <- NULL
> attributes(df)[['class']] <- NULL
> attributes(df)
$names
character(0)
> is.vector(df)
[1] TRUE
Not an answer, but here are some other interesting things that are definitely worth investigating. Some of this has to do with the way objects are stored in R.
One example:
If we set up a matrix of one element, that element being a list, we get the following. Even though it's a list, it can be stored in one element of the matrix.
> x <- matrix(list(1:5)) # we already know that list is also a vector
> x
# [,1]
# [1,] Integer,5
Now if we coerce x to a data frame, it's dimensions are still (1, 1)
> y <- as.data.frame(x)
> dim(y)
# [1] 1 1
Now, if we look at the first element of y, it's the data frame column,
> y[1]
# V1
# 1 1, 2, 3, 4, 5
But if we look at the first column of, y, it's a list
> y[,1]
# [[1]]
# [1] 1 2 3 4 5
which is exactly the same as the first row of y.
> y[1,]
# [[1]]
# [1] 1 2 3 4 5
There are a lot of properties about R objects that are cool to investigate if you have the time.

Extracting columns from a data frame in R [duplicate]

Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same
Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"
dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.
Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"

counting vectors with NA included

By mistake, I found that R count vector with NA included in an interesting way:
> temp <- c(NA,NA,NA,1) # 4 items
> length(temp[temp>1])
[1] 3
> temp <- c(NA,NA,1) # 3 items
> length(temp[temp>1])
[1] 2
At first I assume R will process all NAs into one NA, but this is not the case.
Can anyone explain? Thanks.
You were expecting only TRUE's and FALSE's (and the results to only be FALSE) but a logical vector can also have NA's. If you were hoping for a length zero result, then you had at least three other choices:
> temp <- c(NA,NA,NA,1) # 4 items
> length(temp[ which(temp>1) ] )
[1] 0
> temp <- c(NA,NA,NA,1) # 4 items
> length(subset( temp, temp>1) )
[1] 0
> temp <- c(NA,NA,NA,1) # 4 items
> length( temp[ !is.na(temp) & temp>1 ] )
[1] 0
You will find the last form in a lot of the internal code of well established functions. I happen to think the first version is more economical and easier to read, but the R Core seems to disagree. I have several times been advised on R help not to use which() around logical expressions. I remain unconvinced. It is correct that one should not combine it with negative indexing.
EDIT The reason not to use the construct "minus which" (negative indexing with which) is that in the case where all the items fail the which-test and where you would therefore expect all of them to be returned , it returns an unexpected empty vector:
temp <- c(1,2,3,4,NA)
temp[!temp > 5]
#[1] 1 2 3 4 NA As expected
temp[-which(temp > 5)]
#numeric(0) Not as expected
temp[!temp > 5 & !is.na(temp)]
#[1] 1 2 3 4 A correct way to handle negation
I admit that the notion that NA's should select NA elements seems a bit odd, but it is rooted in the history of S and therefore R. There is a section in ?"[" about "NA's in indexing". The rationale is that each NA as an index should return an unknown result, i.e. another NA.
If you break down each command and look at the output, it's more enlightening:
> tmp = c(NA, NA, 1)
> tmp > 1
[1] NA NA FALSE
> tmp[tmp > 1]
[1] NA NA
So, when we next perform length(tmp[tmp > 1]), it's as if we're executing length(c(NA,NA)). It is fine to have a vector full of NAs - it has a fixed length (as if we'd created it via NA * vector(length = 2), which should be different from NA * vector(length = 3).
You can use 'sum':
> tmp <- c(NA, NA, NA, 3)
> sum(tmp > 1)
[1] NA
> sum(tmp > 1, na.rm=TRUE)
[1] 1
A bit of explanation: 'sum' expects numbers but 'tmp > 1' is logical. So it is automatically coerced to be numeric: TRUE => 1; FALSE => 0; NA => NA.
I don't think there is anything precisely like this in 'The R Inferno' but this is definitely the sort of question that it is aimed at. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Resources