I have a data frame demos, with n columns (depends on external input), where n = 1,2,3 ...
I want to delete certain rows, then add new columns to this data frame. When n > 1, the following code works fine, where demos.part is always an R data.frame.
demos.part <- demos[-i, ] // remove i-th row
demos.part[,"new column name"] <- as.vector(<new data>)
However when n == 1, the demos.part in the first line becomes an vector. Then the second line does not work anymore.
Of course we can hard code to fix the special case. Is there a consistent (elegant) way to remove rows from data.frame and still return a data.frame, even if the data frame has only one column?
Your first line, demos.part <- demos[-i, ], would only drop from a data frame to a matrix if demis.part has exactly one column:
# One column: result is a vector
> data.frame(a=letters)[1,]
[1] a
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
# 2 cols: result is a df with 1 row
> data.frame(a=letters, b=letters)[1,]
data.frame with 1 row and 2 columns
a b
<factor> <factor>
1 a a
To see why this is, you can inspect the arguments of [.data.frame, where the default value of the drop argument depends on the number of columns:
> args(`[.data.frame`)
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
1)
NULL
Regardless, any time you want to prevent dropping of dimensions, simply add drop=FALSE after any indexing arguments (including intentionally blank indexing arguments; note the empty space between the two commas for the blank column index):
> data.frame(a=letters)[1, , drop=FALSE]
data.frame with 1 row and 1 column
a
<factor>
1 a
You should always use drop=FALSE when deciding how many rows/columns to select based on external input, since there is always the possibility that it will select just one row. Alternatively, use the data_frame function from the dplyr package to create a data frame with fewer weird edge cases in its behavior:
> library(dplyr)
> data_frame(a=letters)[1,]
Source: local data frame [1 x 1]
a
(chr)
1 a
Responding to your command about the colnames - i don't think they disappear.
Consider following code:
remove.row <- function(df,n) { as.data.frame(df[-n,]) }
#
a <- data.frame(col1=c(1,2),col2=c("A","B"))
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
#
a <- remove.row(a,1)
a
class(a)
colnames(a)
produces:
> a
col1 col2
1 1 A
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
col1 col2
2 2 B
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
> #
> a <- remove.row(a,1)
> a
[1] col1 col2
<0 rows> (or 0-length row.names)
> class(a)
[1] "data.frame"
> colnames(a)
[1] "col1" "col2"
Related
sorry for the elementary question but I need to partition a list of numbers at an offset of 1.
e.g.,
i have a list like:
c(194187, 193668, 192892, 192802 ..)
and need a list of lists like:
c(c(194187, 193668), c(193668, 192892), c(192892, 192802)...)
where the last element of list n is the first of list n+1. there must be a way to do this with
split()
but I can't figure it out
in mathematica, the command i need is Partition[list,2,1]
You can try like this, using zoo library
library(zoo)
x <- 1:10 # Vector of 10 numbers
m <- rollapply(data = x, 2, by=1, c) # Creates a Matrix of rows = n-1, each row as a List
l <- split(m, row(m)) #splitting the matrix into individual list
Output:
> l
$`1`
[1] 1 2
$`2`
[1] 2 3
$`3`
[1] 3 4
Here is an option using base R to create a vector of elements
v1 <- rbind(x[-length(x)], x[-1])
c(v1)
#[1] 194187 193668 193668 192892 192892 192802
If we need a list
split(v1, col(v1))
data
x <- c(194187, 193668, 192892, 192802);
tl;dr - What the hell is a vector in R?
Long version:
Lots of stuff is a vector in R. For instance, a number is a numeric vector of length 1:
is.vector(1)
[1] TRUE
A list is also a vector.
is.vector(list(1))
[1] TRUE
OK, so a list is a vector. And a data frame is a list, apparently.
is.list(data.frame(x=1))
[1] TRUE
But, (seemingly violating the transitive property), a data frame is not a vector, even though a dataframe is a list, and a list is a vector. EDIT: It is a vector, it just has additional attributes, which leads to this behavior. See accepted answer below.
is.vector(data.frame(x=1))
[1] FALSE
How can this be?
To answer your question another way, the R Internals manual lists R's eight built-in vector types: "logical", "numeric", "character", "list", "complex", "raw", "integer", and "expression".
To test whether the non-attribute part of an object is really one of those vector types "underneath it all", you can examine the results of is(), like this:
isVector <- function(X) "vector" %in% is(X)
df <- data.frame(a=1:4)
isVector(df)
# [1] TRUE
# Use isVector() to examine a number of other vector and non-vector objects
la <- structure(list(1:4), mycomment="nothing")
chr <- "word" ## STRSXP
lst <- list(1:4) ## VECSXP
exp <- expression(rnorm(99)) ## EXPRSXP
rw <- raw(44) ## RAWSXP
nm <- as.name("x") ## LANGSXP
pl <- pairlist(b=5:8) ## LISTSXP
sapply(list(df, la, chr, lst, exp, rw, nm, pl), isVector)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Illustrating what #joran pointed out, that is.vector returns false on a vector which has any attributes other than names (I never knew that) ...
# 1) Example of when a vector stops being a vector...
> dubious = 7:11
> attributes(dubious)
NULL
> is.vector(dubious)
[1] TRUE
#now assign some additional attributes
> attributes(dubious) <- list(a = 1:5)
> attributes(dubious)
$a
[1] 1 2 3 4 5
> is.vector(dubious)
[1] FALSE
# 2) Example of how to strip a dataframe of attributes so it looks like a true vector ...
> df = data.frame()
> attributes(df)
$names
character(0)
$row.names
integer(0)
$class
[1] "data.frame"
> attributes(df)[['row.names']] <- NULL
> attributes(df)[['class']] <- NULL
> attributes(df)
$names
character(0)
> is.vector(df)
[1] TRUE
Not an answer, but here are some other interesting things that are definitely worth investigating. Some of this has to do with the way objects are stored in R.
One example:
If we set up a matrix of one element, that element being a list, we get the following. Even though it's a list, it can be stored in one element of the matrix.
> x <- matrix(list(1:5)) # we already know that list is also a vector
> x
# [,1]
# [1,] Integer,5
Now if we coerce x to a data frame, it's dimensions are still (1, 1)
> y <- as.data.frame(x)
> dim(y)
# [1] 1 1
Now, if we look at the first element of y, it's the data frame column,
> y[1]
# V1
# 1 1, 2, 3, 4, 5
But if we look at the first column of, y, it's a list
> y[,1]
# [[1]]
# [1] 1 2 3 4 5
which is exactly the same as the first row of y.
> y[1,]
# [[1]]
# [1] 1 2 3 4 5
There are a lot of properties about R objects that are cool to investigate if you have the time.
Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same
Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"
dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.
Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"
Sorry this seems like a really silly question but are dataframe[ ,-1] and dataframe[-1] the same, and does it work for all data types?
And why are they the same
Almost.
[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"
dataframe[-1] will treat your data in vector form, thus returning all but the very first element [[edit]] which as has been pointed out, turns out to be a column, as a data.frame is a list. dataframe[,-1] will treat your data in matrix form, returning all but the first column.
Sorry, wanted to leave this as a comment but thought it was too big, I just found it interesting that the only one which remains a non integer is dataframe[1].
Further to Carl's answer, it seems dataframe[[1]] is treated as a matrix as well.
But dataframe[1] isn't....
But it can't be treated as a matrix cause the results for dataframe[[1]] and matrix[[1]] are different.
D <- as.data.frame(matrix(1:16,4))
D
M <- (matrix(1:16,4))
M
> D[ ,1] # data frame leaving out first column
[1] 1 2 3 4
> D[[1]] # first column of dataframe
[1] 1 2 3 4
> D[1] # First column of dataframe
V1
1 1
2 2
3 3
4 4
>
> class(D[ ,1])
[1] "integer"
> class(D[[1]])
[1] "integer"
> class(D[1])
[1] "data.frame"
>
> M[ ,1] # matrix leaving out first column
[1] 1 2 3 4
> M[[1]] # First element of first row & col
[1] 1
> M[1] # First element of first row & col
[1] 1
>
> class(M[ ,1])
[1] "integer"
> class(M[[1]])
[1] "integer"
> class(M[1])
[1] "integer"
So let's say I have a matrix, mdat and I only know the index number. How do I retrieve the column and row names? For example:
> mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol=3, byrow=TRUE,
dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3")))
> mdat[4]
[1] 12
> names(mdat[4])
NULL
> colnames(mdat[4])
NULL
> rownames(mdat[4])
NULL
> dimnames(mdat[4])
NULL
First you need to get the row and column of that index using arrayInd.
k <- arrayInd(4, dim(mdat))
You can then get the right name by getting that element of the row and column names
rownames(mdat)[k[,1]]
colnames(mdat)[k[,2]]
Or both at once using mapply:
mapply(`[[`, dimnames(mdat), k)
Subsetting the matrix first results in a one-element vector that has no names, as you show in your question. Remember that subsetting creates a completely new object via copying. There's no way to reference the original mdat after subsetting.
This is more clear if you assign the result of subsetting to another object.
> m <- mdat[4]
> m
[1] 12
> names(m) # no names were printed above... so
NULL
You really want to access the column/row names first and subset them.
> colnames(mdat)[3]
[1] "C.3"
> rownames(mdat)[2]
[1] "row2"
You can re-assign column/row names similarly.
> colnames(mdat)[3] <- "C3"
> rownames(mdat)[2] <- "row.2"
> mdat
C.1 C.2 C3
row1 1 2 3
row.2 11 12 13