Say I have a tibble named `a'. It has three classes:
class(a)
"tbl_df" "tbl" "data.frame"
How can I extract just one of these classes?
a$data.frame
does not work.
Another example is a haven_labelled object, b which has three classes:
class(b)
"haven_labelled" "vctrs_vctr" "double"
How can I extract just the double part of b?
class() results in an unnamed character vector, which you usually subset using numeric indeces x[i], e.g. class(b)[3] to obtain double".
However you could apply string matching, and write an own my_class() function which is based on a vector of valid class definitions.
valid <- c("data.frame", "double", "character")
my_class <- function(x) {k <- class(x);k[k %in% valid]}
my_class(a)
# [1] "data.frame"
my_class(b)
# [1] "double"
Data:
a <- tibble::as_tibble(data.frame())
b <- haven::labelled()
Related
I have a list of data frames. I want to perform a bunch of operations within a for loop but before that, I need to extract the string name of each dataset to use as variable/data frame name suffixes.
for(i in dflist) {
suffix<- deparse(substitute(i))
print(suffix)
}
However, my output shows as the following:
[1] "i"
[1] "i"
[1] "i"
I know that this is because of R's lazy evaluation framework. But how do I get around this limitation and get R to assign the names of data frames in dflist to the suffix variable ?
You are almost there. Try to understand what elements of a list are and how indexing works when you works with lists.
example data
dflist <- list(
name_df1 = data.frame(a = 1:3)
,name_df2 = data.frame(mee = c("A","B","C","D"))
)
understand list elements
What do we access with index i in a for loop over a list?
for(i in dflist){
print(class(i))
print(names(i))
}
[1] "data.frame"
[1] "a"
[1] "data.frame"
[1] "mee"
index extracts an object of class (here) data frame.
your case
for(i in dflist){
suffix <- names(i)
print(suffix)
}
[1] "a"
[1] "mee"
This question already has answers here:
Making a string concatenation operator in R
(5 answers)
Closed 4 years ago.
I would like to create a + method to paste character objects. One possible approach is by doing it with an infix operator:
`%+%` <- function(x, y) paste0(x, y)
"a" %+% "b" # returns "ab"
However, I'm wondering if it is possible to do the same with an S3 method. You could do it by creating a new class, let's say char, and do something like:
# Create a function to coerce to class 'char'
as.char <- function(x, ...) structure(as.character(x, ...), class = "char")
# S3 method to paste 'char' objects
"+.char" <- function(x, y) paste0(x, y)
a <- as.char("a")
b <- as.char("b")
c(class(a), class(b)) # [1] "char" "char"
a + b # returns "ab"
But, if you try to create a +.character method, it does not work:
"+.character" <- function(x, y) paste0(x, y)
a <- "a"
b <- "b"
c(class(a), class(b)) # [1] "character" "character"
a + b # Error in a + b : non-numeric argument to binary operator
However, if you assign the class character manually it does work:
as.character_ <- function(x, ...) structure(as.character(x, ...), class = "character")
a <- as.character_("a")
b <- as.character_("b")
c(class(a), class(b)) # [1] "character" "character"
a + b # returns "ab"
So I'm just wondering what I'm missing here, and if it is possible to actually define an S3 method for a generic class.
Edit: Based on #Csd answer, it is not clear that this is due to the attributes, because if you define your own function, e.g.:
concat <- function(e1, e2) UseMethod("concat")
concat.character <- function(e1, e2) paste0(e1, e2)
concat("a", "b") # returns "ab"
Then it does work.
It seems that you need to define the class of the variable to be "character". That's exactly what you do! except one thing... which I didn't know either...
Here is an example:
a <- "a"
class(a) # "character"
attributes(a) # NULL!!!
while using you function:
a <- as.character_("a")
class(a) # "character"
attributes(a) # "class" is "character"
So it seems that it has to be defined the attribute class of the variable.
I have a dataframe "c1" with one column as "region".
sum(is.na(c1$region))
[1] 2
class(c1$region)
[1] "factor"
However, when I use paste()
f1<-paste("c1","$","region",sep="")
> f1
[1] "c1$region"
> sum(is.na(f1))
[1] 0
I tried as.name(f1) and as.symbol(f1). Both convert f1 to the "name" class. noquote(f1) converts the char[1] element to the "noquote" class.
> f2<-as.name(f1)
> f2
`c1$region`
> sum(is.na(f2))
[1] 0
Warning message:
In is.na(f2) : is.na() applied to non-(list or vector) of type 'symbol'
> class(f2)
[1] "name"
I want to retain the class of c1$region while being able to use it in queries such as sum(is.na(f2)). Please help.
I'm not 100% sure I understand what you are trying to do, but maybe this will help:
c1 <- data.frame(region=c(letters[1:3], NA))
clust <- 1
variable <- "region"
f1 <- get(paste0("c", clust))[[variable]] # <--- key step
class(f1)
# [1] "factor"
sum(is.na(f1))
# [1] 1
In the key step, we use get to fetch the correct cluster data frame using its name as a character vector, and then we use [[, which unlike $, allows us to use a character variable to specify which column we want.
I don't understand why the class of a vector is the class of the elements of the vector and not vector itself.
vector <- c("la", "la", "la")
class(vector)
## [1] "character"
matrix <- matrix(1:6, ncol=3, nrow=2)
class(matrix)
## [1] "matrix"
This is what I get from this. class is mainly meant for object oriented programming and there are other functions in R which will give you the storage mod of an object (see ?typeof or ?mode).
When looking at ?class
Many R objects have a class attribute, a character vector giving the
names of the classes from which the object inherits. If the object
does not have a class attribute, it has an implicit class, "matrix",
"array" or the result of mode(x)
It seems like class works as follows
It first looks for a $class attribute
If there isn't any, it checks if the object has a matrix or an array structure by checking the $dim attribute (which is not present in a vector)
2.1. if $dim contains two entries, it will call it a matrix
2.2. if $dim contains one entry or more than two entries, it will call it an array
2.3. if $dim is of length 0, it goes to the next step (mode)
if $dim is of length 0 and there is no $class attribute, it performs mode
So per your example
mat <- matrix(rep("la", 3), ncol=1)
vec <- rep("la", 3)
attributes(vec)
# NULL
attributes(mat)
## $dim
## [1] 3 1
So you can see that vec doesn't contain any attributes whatsoever (see ?c or ?as.vector for explanation)
So in first case, class performs
attributes(vec)$class
# NULL
length(attributes(vec)$dim)
# 0
mode(vec)
## [1] "character"
In the second case it checks
attributes(mat)$class
# NULL
length(attributes(mat)$dim)
##[1] 2
It sees that the object has two dimensions and there for calls it matrix
In order to illustrate that both vec and mat have same storage mode, you can do
mode(vec)
## [1] "character"
mode(mat)
## [1] "character"
You can also see, for example, same behavior with an array
ar <- array(rep("la", 3), c(3, 1)) # two dimensional array
class(ar)
##[1] "matrix"
ar <- array(rep("la", 3), c(3, 1, 1)) # three dimensional array
class(ar)
##[1] "array"
So both array and matrix don't parse a class attribute. Let's check, for example, what data.frame does.
df <- data.frame(A = rep("la", 3))
class(df)
## [1] "data.frame"
Where did class took it from?
attributes(df)
# $names
# [1] "A"
#
# $row.names
# [1] 1 2 3
#
# $class
# [1] "data.frame"
As you can see, data.fram sets a $class attribute, but this could be changed
attributes(df)$class <- NULL
class(df)
## [1] "list"
Why list? Because data.frame don't have a $dim attribute (neither a $class one, because we just deleted it), thus class performs mode(df)
mode(df)
## [1] "list"
Lastly, in order to illustrate how class works, we can manually set the class to whatever we want and see what it will give us back
mat <- structure(mat, class = "vector")
vec <- structure(vec, class = "vector")
class(mat)
## [1] "vector"
class(vec)
## [1] "vector"
R needs to know the class of the object you are operating on to perform the appropriate method dispatch on that object. The atomic data type in R is a vector, there is no such thing as a scalar, i.e. R considers a single integer a length one vector; is.vector( 1L ).
In order to dispatch the correct method R must know the datatype. It's not much using knowing that something is a vector, when your language is implicitly vectorised and everything is designed to operate on a vector.
is.vector( list( NULL , NULL ) )
is.vector( NA )
is.vector( "a" )
is.vector( c( 1.0556 , 2L ) )
So you can take the return value of class( 1L ) which is [1] "integer" to mean, I am an atomic vector consisting of type integer.
Despite the fact that under the hood a matrix is actually just a vector with two dimension attributes, R must know it is a matrix so that it can operate row-wise or column-wise on the elements of the matrix (or individually on any single subscripted element). After subsetting, you will return a vector of the datatype of the elements in your matrix, which will allow R to dispatch the appropriate method for your data (e.g. performing sort on a character vector or a numeric vector);
/* from the underlying C code in /src/main/subset.c....*/
result = allocVector(TYPEOF(x), (R_xlen_t) nrs * (R_xlen_t) ncs)
You should also note, that before determining the class of an object, R will always check that it is a first a vector, e.g. in the case of running is.matrix(x) on some matrix, x, R checks that it is first a vector, and then it checks for dimension attributes. If the dimension attributes is a vector of INTEGER data types of LENGTH 2 it satisfies the conditions for that object being a matrix (the following code snippet is from Rinlinedfuns.h from /src/include/)
INLINE_FUN Rboolean isMatrix(SEXP s)
495 {
496 SEXP t;
497 if (isVector(s)) {
498 t = getAttrib(s, R_DimSymbol);
499 /* You are not supposed to be able to assign a non-integer dim,
500 although this might be possible by misuse of ATTRIB. */
501 if (TYPEOF(t) == INTSXP && LENGTH(t) == 2)
502 return TRUE;
503 }
504 return FALSE;
505 }
# e.g. create an array with height and width....
a <- array( 1:4 , dim=c(2,2) )
# this is a matrix!
class(a)
#[1] "matrix"
# And the class of the first column is an atomic vector of type integer....
class(a[,1])
[1] "integer"
In the R language definition, there are six basic types of vector, one of which is "character". There really isn't a base "vector" type, but six different kinds of vectors that are all base types.
On the other hand, Matrix is a type of data structure.
Here's the best diagram I've found that lays out the class hierarchy used by the class function:
Although the class names don't correspond exactly with the results of the R class function, I believe the hierarchy is basically accurate. The key to your answer is that the class function only gives the root class in the hierarchy.
You will see that Vector is not a root class. The root class for your example would be StrVector, which corresponds to the "character" class, the class for a vector with character elements. In contrast, Matrix is itself a root class; hence, its class is "matrix".
When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?
Let's wrap up multiple comments into an explanation.
the use of apply converts a data.frame to a matrix. This
means that the least restrictive class will be used. The least
restrictive in this case is character.
You're supplying 1 to apply's MARGIN argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply designed for matrices
and data.frames on a vector. This is not the right tool for the job.
In ths case I'd use lapply or sapply as rmk points out to grab the classes of
the single t2 column as seen below:
Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply family of functions do.
Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"