I have a dataframe "c1" with one column as "region".
sum(is.na(c1$region))
[1] 2
class(c1$region)
[1] "factor"
However, when I use paste()
f1<-paste("c1","$","region",sep="")
> f1
[1] "c1$region"
> sum(is.na(f1))
[1] 0
I tried as.name(f1) and as.symbol(f1). Both convert f1 to the "name" class. noquote(f1) converts the char[1] element to the "noquote" class.
> f2<-as.name(f1)
> f2
`c1$region`
> sum(is.na(f2))
[1] 0
Warning message:
In is.na(f2) : is.na() applied to non-(list or vector) of type 'symbol'
> class(f2)
[1] "name"
I want to retain the class of c1$region while being able to use it in queries such as sum(is.na(f2)). Please help.
I'm not 100% sure I understand what you are trying to do, but maybe this will help:
c1 <- data.frame(region=c(letters[1:3], NA))
clust <- 1
variable <- "region"
f1 <- get(paste0("c", clust))[[variable]] # <--- key step
class(f1)
# [1] "factor"
sum(is.na(f1))
# [1] 1
In the key step, we use get to fetch the correct cluster data frame using its name as a character vector, and then we use [[, which unlike $, allows us to use a character variable to specify which column we want.
Related
How do I change the typeof of one object a to another object b
without explicitly specifying the type
a <- letters
b <- as.factor(a)
typeof(a)
#> [1] "character"
So I would like to convert b to typeof(a), but without explicitly
using as.character, because in another instance a might be e.g.
integer. This obviously does not work:
typeof(b) <- typeof(a)
The closest I could come is, but not sure if there's any better solution.
a <- '1'
b <- 2
a <- unlist(lapply(a,paste0('as.',class(b))))
a
a <- '245'
a <- unlist(lapply(a,paste0('as.',class(b))))
a
Output:
> a
[1] 245
Similar answer to #amrrs except I think you may want this functionality to be used programtically, and when converting types of an object you may get errors that return NA's - which is unwanted behaviour - when you try and convert variables that cannot be coerced to another data type.
The below function accounts for this based on R's coercion rules. (Assuming you want to be using class() and not typeof())
convertClass <- function(object1, object2){
logic=c("logical", "integer", "numeric", "complex", "character", "list")
ifelse(match(class(object1), logic) < match(class(object2), logic),
eval(parse(text=paste0('as.',class(object2),"(",object1,")"))),
paste0("convertClass() cannot convert type ", class(object1), " to ", class(object2))
)
}
> convertClass(1,'1')
[1] "1"
> convertClass('1', 1)
[1] "convertClass() cannot convert type character to numeric"
Using this loses the functionality of converting "1" to 1 for example, which can be coerced in R, but does provide a strict safeguard if you don't know the type of a variable that you are feeding the function.
I have the following list
test_list=list(list(a=1,b=2),list(a=3,b=4))
and I want to extract all elements with list element name a.
I can do this via
sapply(test_list,`[[`,"a")
which gives me the correct result
#[1] 1 3
When I try the same with Rs dollar operator $, I get NULL
sapply(test_list,`$`,"a")
#[[1]]
#NULL
#
#[[2]]
#NULL
However, if I use it on a single element of test_list it works as expected
`$`(test_list[[1]],"a")
#[1] 1
Am I missing something obvious here?
evaluation vs. none
[[ evaluates its argument whereas $ does not. L[[a]] gets the component of L whose name is held in the variable a. $ just passes the argument name itself as a character string so L$a finds the "a" component of L. a is not regarded as a variable holding the component name -- just a character string.
Below L[[b]] returns the component of L named "a" because the variable b has the value "a" whereas L$b returns the componet of L named "b" because with that syntax b is not regarded as a variable but is regarded as a character string which itself is passed.
L <- list(a = 1, b = 2)
b <- "a"
L[[b]] # same as L[["a"]] since b holds a
## [1] 1
L$b # same as L[["b"]] since b is regarded as a character string to be passed
## [1] 2
sapply
Now that we understand the key difference bewteen $ and [[ to see what is going on with sapply consider this example. We have made each element of test_list into a "foo" object and defined our own $.foo and [[.foo methods which simply show what R is passing to the method via the name argument:
foo_list <- test_list
class(foo_list[[1]]) <- class(foo_list[[2]]) <- "foo"
"$.foo" <- "[[.foo" <- function(x, name) print(name)
result <- sapply(foo_list, "$", "a")
## "..."
## "..."
result2 <- sapply(foo_list, "[[", "a")
## [1] "a"
## [1] "a"
What is happening in the first case is that sapply is calling whatever$... and ... is not evaluated so it would be looking for a list component which is literally named "..." and, of course, there is no such component so whatever$... is NULL hence the NULLs shown in the output in the question. In the second case whatever[[[...]] evaluates to whatever[["a"]] hence the observed result.
From what I've been able to determine it's a combination of two things.
First, the second element of $ is matched but not evaluated so it cannot be a variable.
Secondly, when arguments are passed to functions they are assigned to the corresponding variables in the function call. When passed to sapply "a" is assigned to a variable and therefore will no longer work with $. We can see this by occurring by running
sapply("a", print)
[1] "a"
a
"a"
This can lead to peculiar results like this
sapply(test_list, function(x, a) {`$`(x, a)})
[1] 1 3
Where despite a being a variable (which hasn't even been assigned) $ matches it to the names of the elements in the list.
Why does get() in combination with paste() work for dataframes but not for columns within a dataframe? how can I make it work?
ab<-12
get(paste("a","b",sep=""))
# gives: [1] 12
ab<-data.frame(a=1:3,b=3:5)
ab$a
#gives: [1] 1 2 3
get(paste("a","b",sep=""))
# gives the whole dataframe
get(paste("ab$","a",sep=""))
# gives: Error in get(paste("ab$", "a", sep = "")) : object 'ab$a' not found
Columns in dataframes are not first class objects. Their "names" are really indexing values for list-extraction. Despite the understandable confusion caused by the existence of the names function, they are not true R-names, i.e. unquoted tokens or symbols, in the list of R objects. See the ?is.symbol help page. The get function takes a character value, and then looks for it in the workspace and returns it for further processing.
> ab<-data.frame(a=1:3,b=3:5)
> ab$a
[1] 1 2 3
> get(paste("a","b",sep=""))
a b
1 1 3
2 2 4
3 3 5
>
> # and this would be the way to get the 'a' column of the ab object
get(paste("ab",sep=""))[['a']]
If there were a named object target with a value "a" tehn you could also do:
target <- "a"
get(paste("ab",sep=""))[[target]] # notice no quotes around target
# because `target` is a _real_ R name
It doesn't work because get() interprets the string it's passed as referring to an object named "ab$a" (not as referring to the element named "a" of the object named "ab") . Here's probably the best way to see what that means:
ab<-data.frame(a=1:3,b=3:5)
`ab$a` <- letters[1:3]
get("ab$a")
# [1] "a" "b" "c"
I don't understand why the class of a vector is the class of the elements of the vector and not vector itself.
vector <- c("la", "la", "la")
class(vector)
## [1] "character"
matrix <- matrix(1:6, ncol=3, nrow=2)
class(matrix)
## [1] "matrix"
This is what I get from this. class is mainly meant for object oriented programming and there are other functions in R which will give you the storage mod of an object (see ?typeof or ?mode).
When looking at ?class
Many R objects have a class attribute, a character vector giving the
names of the classes from which the object inherits. If the object
does not have a class attribute, it has an implicit class, "matrix",
"array" or the result of mode(x)
It seems like class works as follows
It first looks for a $class attribute
If there isn't any, it checks if the object has a matrix or an array structure by checking the $dim attribute (which is not present in a vector)
2.1. if $dim contains two entries, it will call it a matrix
2.2. if $dim contains one entry or more than two entries, it will call it an array
2.3. if $dim is of length 0, it goes to the next step (mode)
if $dim is of length 0 and there is no $class attribute, it performs mode
So per your example
mat <- matrix(rep("la", 3), ncol=1)
vec <- rep("la", 3)
attributes(vec)
# NULL
attributes(mat)
## $dim
## [1] 3 1
So you can see that vec doesn't contain any attributes whatsoever (see ?c or ?as.vector for explanation)
So in first case, class performs
attributes(vec)$class
# NULL
length(attributes(vec)$dim)
# 0
mode(vec)
## [1] "character"
In the second case it checks
attributes(mat)$class
# NULL
length(attributes(mat)$dim)
##[1] 2
It sees that the object has two dimensions and there for calls it matrix
In order to illustrate that both vec and mat have same storage mode, you can do
mode(vec)
## [1] "character"
mode(mat)
## [1] "character"
You can also see, for example, same behavior with an array
ar <- array(rep("la", 3), c(3, 1)) # two dimensional array
class(ar)
##[1] "matrix"
ar <- array(rep("la", 3), c(3, 1, 1)) # three dimensional array
class(ar)
##[1] "array"
So both array and matrix don't parse a class attribute. Let's check, for example, what data.frame does.
df <- data.frame(A = rep("la", 3))
class(df)
## [1] "data.frame"
Where did class took it from?
attributes(df)
# $names
# [1] "A"
#
# $row.names
# [1] 1 2 3
#
# $class
# [1] "data.frame"
As you can see, data.fram sets a $class attribute, but this could be changed
attributes(df)$class <- NULL
class(df)
## [1] "list"
Why list? Because data.frame don't have a $dim attribute (neither a $class one, because we just deleted it), thus class performs mode(df)
mode(df)
## [1] "list"
Lastly, in order to illustrate how class works, we can manually set the class to whatever we want and see what it will give us back
mat <- structure(mat, class = "vector")
vec <- structure(vec, class = "vector")
class(mat)
## [1] "vector"
class(vec)
## [1] "vector"
R needs to know the class of the object you are operating on to perform the appropriate method dispatch on that object. The atomic data type in R is a vector, there is no such thing as a scalar, i.e. R considers a single integer a length one vector; is.vector( 1L ).
In order to dispatch the correct method R must know the datatype. It's not much using knowing that something is a vector, when your language is implicitly vectorised and everything is designed to operate on a vector.
is.vector( list( NULL , NULL ) )
is.vector( NA )
is.vector( "a" )
is.vector( c( 1.0556 , 2L ) )
So you can take the return value of class( 1L ) which is [1] "integer" to mean, I am an atomic vector consisting of type integer.
Despite the fact that under the hood a matrix is actually just a vector with two dimension attributes, R must know it is a matrix so that it can operate row-wise or column-wise on the elements of the matrix (or individually on any single subscripted element). After subsetting, you will return a vector of the datatype of the elements in your matrix, which will allow R to dispatch the appropriate method for your data (e.g. performing sort on a character vector or a numeric vector);
/* from the underlying C code in /src/main/subset.c....*/
result = allocVector(TYPEOF(x), (R_xlen_t) nrs * (R_xlen_t) ncs)
You should also note, that before determining the class of an object, R will always check that it is a first a vector, e.g. in the case of running is.matrix(x) on some matrix, x, R checks that it is first a vector, and then it checks for dimension attributes. If the dimension attributes is a vector of INTEGER data types of LENGTH 2 it satisfies the conditions for that object being a matrix (the following code snippet is from Rinlinedfuns.h from /src/include/)
INLINE_FUN Rboolean isMatrix(SEXP s)
495 {
496 SEXP t;
497 if (isVector(s)) {
498 t = getAttrib(s, R_DimSymbol);
499 /* You are not supposed to be able to assign a non-integer dim,
500 although this might be possible by misuse of ATTRIB. */
501 if (TYPEOF(t) == INTSXP && LENGTH(t) == 2)
502 return TRUE;
503 }
504 return FALSE;
505 }
# e.g. create an array with height and width....
a <- array( 1:4 , dim=c(2,2) )
# this is a matrix!
class(a)
#[1] "matrix"
# And the class of the first column is an atomic vector of type integer....
class(a[,1])
[1] "integer"
In the R language definition, there are six basic types of vector, one of which is "character". There really isn't a base "vector" type, but six different kinds of vectors that are all base types.
On the other hand, Matrix is a type of data structure.
Here's the best diagram I've found that lays out the class hierarchy used by the class function:
Although the class names don't correspond exactly with the results of the R class function, I believe the hierarchy is basically accurate. The key to your answer is that the class function only gives the root class in the hierarchy.
You will see that Vector is not a root class. The root class for your example would be StrVector, which corresponds to the "character" class, the class for a vector with character elements. In contrast, Matrix is itself a root class; hence, its class is "matrix".
I would like to insert an element into a list in R. The problem is that I would like it to have a name contained within a variable.
> list(c = 2)
$c
[1] 2
Makes sense. I obviously want a list item named 'c', containing 2.
> a <- "b"
> list(a = 1)
$a
[1] 1
Whoops. How do I tell R when i want it to treat a word as a variable, instead of as a name, when I am creating a list?
Some things I tried:
> list(eval(a)=2)
Error: unexpected '=' in "list(eval(a)="
> list(a, 2)
[[1]]
[1] "b"
[[2]]
[1] 2
> list(get(a) = 2)
Error: unexpected '=' in "list(get(a) ="
I know that if I already have a list() laying around, I could do this:
> ll<-list()
> ll[[a]]=456
> ll
$b
[1] 456
...But:
> list()[[a]]=789
Error in list()[[a]] = 789 : invalid (NULL) left side of assignment
How can I construct an anonymous list containing an element whose name is contained in a variable?
One option:
a <- "b"
> setNames(list(2),a)
$b
[1] 2
or the somewhat more "natural":
l <- list(2)
names(l) <- a
and if you look at the code in setNames you'll see that these two methods are so identical that "the way to do this" in R really is basically to create the object, and then set the names, in two steps. setNames is just a convenient way to do that.