Store S4 objects in data.frame or data.table - r

I'm trying to put complex S4 objects (generated with Seurat package) in data.table (I read that it was not possible to use a list or a data.frame, but I didn't find anything about the compatibility of data.table with S4 objects) depending on the value of one of their attribute with a function.
These objects all come from a bigger object that I called dataset in the function I wrote:
subsets_by_cluster <- function(dataset){
nclust=data.table(cluster_ID=c(rep(NA,length(unique(dataset#active.ident)))))
for (i in length(nclust)){
nclust[i]=dataset[,dataset#active.ident==unique(dataset#active.ident)[i]]
}
return(nclust)}
I was expecting getting a data.table full of S4 objects, with one column with as many rows as number of different #active.ident values (cluster IDs)
But when I run it on my original dataset, I get the error
Error in [<-.data.frame(*tmp*, i, 1, value = new("Seurat", assays = list( : replacement has 2965 rows, data has 1
I also tried to do it manually with this kind of line
nclust[1]=dataset[,dataset#active.ident==unique(dataset#active.ident)[1]]
but it didn't work either, prompting the error :
type 'S4' cannot be coerced to 'logical'
Storing the subset in a variable works perfectly, but I would like my script be able do handle different cluster numbers.
I was thinking about writing the files to read so they can then be read, but it seems far from being a optimal solution.
Do you have suggestions ?

First, creating a simple S4 class (taken from Hadley Wickham's Advanced R)
setClass("Person",
slots = c(
name = "character",
age = "numeric"
)
)
As #John Paul mentions, you can create a few and store them in a list
john <- new("Person", name = "John Smith", age = NA_real_)
jane <- new("Person", name = "Jane Smith", age = NA_integer_)
myPeeps <- list(john, jane)
Printing the list
> myPeeps
[[1]]
An object of class "Person"
Slot "name":
[1] "John Smith"
Slot "age":
[1] NA
[[2]]
An object of class "Person"
Slot "name":
[1] "Jane Smith"
Slot "age":
[1] NA
Since a data.frame is a special type of list and as we see above a list element can be an S4 object, you can store them in a column as well. You just have to use the I() function
size <- 5
propsToMyPeeps <- data.frame(
propsFrom = I(sample(myPeeps, size, replace = TRUE)),
propsValue = sample.int(10, size, replace = TRUE),
propsTo = I(sample(myPeeps, size, replace = TRUE))
)
By default, the print method for data.frame doesn't know how to coerce our Person to a character string so printing the data.frame will cause an error. But if you subset the column, you can see all the objects are there.
> print(propsToMyPeeps$propsTo)
[[1]]
An object of class "Person"
Slot "name":
[1] "Jane Smith"
Slot "age":
[1] NA
[[2]]
An object of class "Person"
Slot "name":
[1] "John Smith"
Slot "age":
[1] NA
[[3]]
An object of class "Person"
Slot "name":
[1] "John Smith"
Slot "age":
[1] NA
[[4]]
An object of class "Person"
Slot "name":
[1] "Jane Smith"
Slot "age":
[1] NA
[[5]]
An object of class "Person"
Slot "name":
[1] "Jane Smith"
Slot "age":
[1] NA

You can do it like this:
library(Seurat)
library(data.table)
data(pbmc_small)
nclust = data.table(cluster_ID=levels(Idents(pbmc_small)))
nclust$data = lapply(nclust$cluster_ID,function(i){
pbmc_small[,Idents(pbmc_small)==i]
})
And they can be accessed:
library(gridExtra)
grid.arrange(grobs=lapply(nclust$data,DimPlot),ncol=3)
cluster_ID data
1: 0 <Seurat>
2: 1 <Seurat>
3: 2 <Seurat>
the error in your code comes with first defining the column to be only NAs,and replacing them one at a time. And, it should be for for(i in 1:nrow(nclust)) instead of for(i in length(nclust))
If you start by defining it as a list of NAs, it works:
subsets_by_cluster <- function(dataset){
lvl = levels(Idents(dataset))
nclust=data.table(
cluster_ID = lvl,
data=replicate(length(lvl),NA,simplify=FALSE)
)
for (i in 1:nrow(nclust)){
nclust$data[[i]]=dataset[,Idents(dataset)==lvl[i]]
}
return(nclust)}
subsets_by_cluster(pbmc_small)
cluster_ID data
1: 0 <Seurat>
2: 1 <Seurat>
3: 2 <Seurat>

Related

How to make Ops method compatible between two objects of a non base class

Let's imagine that I have a class "my" and I want to trigger certain behaviour when it is added to an object that has units (i.e. from units package):
library(units)
my1 = structure(2, class="my")
Ops.my <- function(e1, e2=NULL) {
ok <-
switch(
.Generic,
`-` = ,
`*` = ,
`+` = ,
'<=' = TRUE,
FALSE
)
if (!ok) {
stop(gettextf("%s not meaningful", sQuote(.Generic)))
}
get(.Generic)(as.integer(e1), as.integer(e2))
}
my1+set_units(5,nm)
Currently, it gives me the following warning:
Warning message:
Incompatible methods ("Ops.my", "Ops.units") for "+"
But I actually want to handle "my" and "units" addition in a certain way, how do I do it?
I tried with something like Ops.my.units <- but it doesn't seem to work.
There doesn't seem to be a way to do this with Ops. From the docs:
The classes of both arguments are considered in dispatching any member of this group. For each argument its vector of classes is examined to see if there is a matching specific (preferred) or Ops method. If a method is found for just one argument or the same method is found for both, it is used. If different methods are found, there is a warning about ‘incompatible methods’
This is probably a good thing. Part of the benefit of an object-oriented system in a non-compiled language like R is that it helps preserve type safety. This stops you from accidentally adding apples to oranges, as we can see in the following example:
apples <- structure(2, class = "apples")
oranges <- structure(2, class = "oranges")
Ops.apples <- function(e1, e2) {
value <- do.call(.Generic, list(as.integer(e1), as.integer(e2)))
class(value) <- "apples"
value
}
Ops.oranges <- function(e1, e2) {
value <- do.call(.Generic, list(as.integer(e1), as.integer(e2)))
class(value) <- "oranges"
value
}
apples + apples
#> [1] 4
#> attr(,"class")
#> [1] "apples"
oranges + oranges
#> [1] 4
#> attr(,"class")
#> [1] "oranges"
apples + oranges
#> [1] 4
#> attr(,"class")
#> [1] "apples"
#> Warning message:
#> Incompatible methods ("Ops.apples", "Ops.oranges") for "+"
You can see that even here, we could just ignore the warning.
suppressWarnings(apples + oranges)
#> [1] 4
#> attr(,"class")
#> [1] "apples"
But hopefully you can see why this may not be good - we have added 2 apples and 2 oranges, and have returned 4 apples.
Throughout R and its extension packages, there are numerous type-conversion functions such as as.integer, as.numeric, as.logical, as.character, as.difftime etc. These allow for some element of control when converting between types and performing operations on different types.
The "right" way to do this kind of thing is specifically convert one of the object types to the other in order to perform the operation:
as.my <- function(x) UseMethod("as.my")
as.my.default <- function(x) {
value <- as.integer(x)
class(value) <- 'my'
value
}
my1 + as.my(set_units(5,nm))
#> [1] 7

Can an object accept different types for the same parameter in R?

Is it possible to have an object accept multiple types for the same parameter in R?
Say I want to create an object called taxpayer with and attribute id. One taxpayer may be identified by 1234 and the other by Smith, John. How could I accommodate for the fact that there are multiple types that could go in the id field?
I recognize that I could just make the parameter a character field and put in 1234 as a string and convert thereafter, but wanted to ask in case there was a work-around.
R has dynamic typing. The thing you are asking about is the default behavior. If you send in a number, it will treat it as a number. If you send in a character, it will treat it as a string.
Here is an example:
# Define taxpayer class
taxpayer <- function(id) {
# Create new structure of class "taxpayer"
t <- structure(list(), class = c("taxpayer", "list"))
# Assign ID attribute
t$id <- id
return(t)
}
# Instantiate taxpayer with character ID
t1 <- taxpayer("Smith, John")
t1
# $id
# [1] "Smith, John"
#
# attr(,"class")
# [1] "taxpayer" "list"
# Check class
class(t1$id)
# [1] "character"
# Instantiate taxpayer with numeric ID
t2 <- taxpayer(1234)
t2
# $id
# [1] 1234
#
# attr(,"class")
# [1] "taxpayer" "list"
# Check class
class(t2$id)
# [1] "numeric"
The above answer was for an S3 style class. For an S4 style class, you can use the "ANY" type specifier, which will take any data type. Here is the same example as above modified for S4:
# Define taxpayer class
taxpayer <- setClass("taxpayer", slots = c(id = "ANY"))
# Instantiate taxpayer with character ID
t1 <- taxpayer(id = "Smith, John")
t1
# An object of class "taxpayer"
# Slot "id":
# [1] "Smith, John"
# Check class
class(t1#id)
# [1] "character"
# Instantiate taxpayer with numeric ID
t2 <- taxpayer(id = 1234)
t2
# An object of class "taxpayer"
# Slot "id":
# [1] 1234
# Check class
class(t2#id)
# [1] "numeric"

Recursive indexing of lists with variable index value per recursion step

Puh... even trying to frame the title properly already gives me a headache.
I have a config.yml with nested values and I would like to define an indexing function get_config() that accepts "path-like" value strings.
The "path entities" of the value string match the nested entity structure of the config file. Based on the path-like value the function should then go and grab the corresponding hierarchy entity (either "branches" or "leaves") from the config file.
Example
Suppose this is the structure of the config.yml:
default:
column_names:
col_id: "id"
col_value: "value"
column_orders:
data_structure_a: [
column_names/col_id,
column_names/col_value
]
data_structure_b: [
column_names/col_value,
column_names/col_id
]
Here's a parsed version for you to play around with:
x <- yaml::yaml.load(
'default:
column_names:
col_id: "id"
col_value: "value"
column_orders:
data_structure_a: [
column_names/col_id,
column_names/col_value
]
data_structure_b: [
column_names/col_value,
column_names/col_id
]'
)
Accessing top-level entities is easy with config::get(value):
config::get("column_names")
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"
config::get("column_orders")
# [1] "hello" "world"
But I would also like to access deeper entities, e.g. column_names: col_id.
In pseudo code:
config::get("column_names:col_id")
or
config::get("column_orders/data_structure_a")
The best I could come up with so far: relying on unlist()
get_config <- function(value, sep = ":") {
if (value %>% stringr::str_detect(sep)) {
value <- value %>% stringr::str_replace(sep, ".")
configs <- config::get() %>% unlist()
configs[value]
} else {
config::get(value)
}
}
get_config("column_names")
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"
get_config("column_names:col_id")
# column_names.col_id
# "id"
Though not elegant, it works for most use cases, but fails for unnamed list entities in the config file
get_config("column_orders:data_structure_a")
# <NA>
# NA
as my indexing approach doesn't play well with the result of unlist() on unnamed lists:
config::get() %>% unlist()
# column_names.col_id column_names.col_value
# "id" "value"
# column_orders.data_structure_a1 column_orders.data_structure_a2
# "column_names/col_id" "column_names/col_value"
# column_orders.data_structure_b1 column_orders.data_structure_b2
# "column_names/col_value" "column_names/col_id"
Thus, I'd like to "go recursive" but my brain says: "no way, dude"
Due diligence
This solution comes close (I guess).
But I keep thinking that I need something like purrr::map2_if() or purrr::pmap_if() (which AFAIK don't exist) instead of purrr::map_if(), as I need to not only traverse the list behind config::get() recursively, but also a listified version of value (e.g. via stringr::str_split(value, sep) %>% unlist() %>% as.list())?
You could also use purrr::pluck to index into a nested list by name if that is what you are after:
x <- yaml::yaml.load('
column_names:
col_id: "id"
col_value: "value"
column_orders:
data_structure_a: [
column_names/col_id,
column_names/col_value
]
data_structure_b: [
column_names/col_value,
column_names/col_id
]
nested_list:
element_1:
element_2:
value: "hello world"
')
purrr::pluck(x, "column_names", "col_id")
#> [1] "id"
purrr::pluck(x, "column_names")
#> $col_id
#> [1] "id"
#>
#> $col_value
#> [1] "value"
purrr::pluck(x, "column_orders", "data_structure_a")
#> [1] "column_names/col_id" "column_names/col_value"
purrr::pluck(x, "column_names", "col_notthere")
#> NULL
I came up with a solution based on Recall().
However, while digging up the internet in trying to get here, I recall having read somewhere that Recall() is generally not a very (memory) efficient way of doing recursion in R? Also would appreciate additional hints on how to do recursion the tidy way with purrr and friends.
Config file content
Being able to call get_config() implies that you have a config.yml file with above content in your project's root directory given by here::here(), but you can test get_list_element_recursively() with this workaround:
x <- yaml::yaml.load('
column_names:
col_id: "id"
col_value: "value"
column_orders:
data_structure_a: [
column_names/col_id,
column_names/col_value
]
data_structure_b: [
column_names/col_value,
column_names/col_id
]
nested_list:
element_1:
element_2:
value: "hello world"
')
Function defs
get_config <- function(value, sep = "/") {
get_list_element_recursively(
config::get(),
stringr::str_split(value, sep, simplify = TRUE)
)
}
get_list_element_recursively <- function(
lst,
el,
.el_trace = el,
.level_trace = 1
) {
# Reached leaf:
if (!is.list(lst)) {
return(lst)
}
# Element not in list:
if (!(el[1] %in% names(lst))) {
message("Current list branch:")
# print(lst)
message(str(lst))
message("Trace of indexing vec (last element is invalid):")
message(stringr::str_c(.el_trace[.level_trace], collapse = "/"))
stop(stringr::str_glue("No such element in list: {el[1]}"))
}
lst <- lst[[ el[1] ]]
if (!is.na(el[2])) {
# Continue if there are additional elements in `el` vec
Recall(lst, el[-1], .el_trace, .level_trace = 1:(.level_trace + 1))
} else {
# Otherwise return last indexing result:
lst
}
}
Testing get_config()
get_config("column_names")
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"
get_config("column_names/col_id")
# [1] "id"
get_config("column_names/col_nonexisting")
# Current list branch:
# List of 6
# $ col_id : chr "id"
# $ col_value : chr "value"
#
# Trace of indexing vec (last element is invalid):
# column_names/col_nonexisting
# Error in get_list_element_recursively(config::get(), stringr::str_split(value, :
# No such element in list: col_nonexisting
get_config("column_orders")
# $data_structure_a
# [1] "column_names/col_id" "column_names/col_value"
#
# $data_structure_b
# [1] "column_names/col_value" "column_names/col_id"
get_config("column_orders/data_structure_a")
# [1] "column_names/col_id" "column_names/col_value"
Testing get_list_element_recursively()
get_list_element_recursively(x, c("column_names"))
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"
get_list_element_recursively(x, c("column_names", "col_id"))
# [1] "id"
get_list_element_recursively(x, c("column_names", "col_notthere"))
# Current list branch:
# List of 2
# $ col_id : chr "id"
# $ col_value: chr "value"
#
# Trace of indexing vec (last element is invalid):
# column_names/col_notthere
# Error in get_list_element_recursively(x$default, c("column_names", "col_notthere")) :
# No such element in list: col_notthere

Handling JSON data in R

So I have nested JSON data as shown below,
{
"School_Days" :[
{
"ts" : 1234,
"val": "ABC"
},
{
"ts" : 0987,
"val": "EFG"
}
]
}
So when I create a data frame it creates a dataframe but it has 4 columns and 1 row instead of 2 columns and 2 rows
below is my code for parsing the Json data,
sc_data <- content(school_json,"parsed", "application/json","Accept:
application/json")
sc_df <- data.frame(sc_data, stringsAsFactors = FALSE)
Current Dataframe
School_Days.ts School_Days.Val School_Days.ts1 School_Days.val1
123 ABC 0987 EFG
Expected DataFrame
School_Days.ts School_Days.Val
123 ABC
0987 EFG
NOTE: I am currently fetching JSON data from and REST API GET call and store it in school_json
Also, typeof(school_json) results as List which is of the following format,
$School_Days
$School_Days[[1]]
$School_Days[[1]]$ts
[1] 1234
$School_Days[[1]]$Val
[1] "ABC"
$School_Days[[2]]
$School_Days[[2]]$ts
[1] 0987
$School_Days[[2]]$Val
[1] "EFG"
So here I found the solution to my question,
I did the following changes to my content() function,
sc_data <- content(school_json,"text", "application/json")
sc_df <- fromJSON(sc_data, flatten = TRUE)
sc_df <- data.frame(sc_df,stringAsFactors = FALSE)
Instead of retrieving as "parsed" I retrieved it as "text" which gave me JSON data in the form of String.

Mapping different args to a list of data frames

This might be a job for Purr or maybe I just need to change the layout of my function or my data here.
I've got a function that takes 2 arguments which I'm trying to apply across a list of data frames. One of the arguments should be the list element (the name) name whereas the other will be a list component (a value in the list).
Book List - Book title and Chapters
my_list <- list(Book1 = c("ABC", "DEF", "GHI"), Book2 = c("ABB", "BCC"), Book3 = c("AAA", "BBB", "CCC", "DDD"))
Function with arg for list component and value
my_function <- function(Book, Chapter) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
##Would have API call here, but let's just print path
path
}
I can easily call this on an individual item by specifying the arguments in map
map(my_list$Book1, function(Chapter) my_function(Chapter =
Chapter, Book = "Book1"))
Output:
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
But How do I apply the function to each list element, calling the function to each Book name and the chapter values?
I'm hoping for something like
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
[[4]]
[1] "www.fake-website.comctp:Book2/ABB"
[[5]]
[1] "www.fake-website.comctp:Book2/BCC"
[[6]]
[1] "www.fake-website.comctp:Book2/AAA"
[[7]]
[1] "www.fake-website.comctp:Book2/BBB"
[[8]]
[1] "www.fake-website.comctp:Book2/CCC"
[[9]]
[1] "www.fake-website.comctp:Book2/DDD"
My function actually isn't simply pasting Books and Chapters, but getting a bunch of info from the API and parsing it.
However, what I need help with is mapping across the list of data frames and pairing the book arg with the chapter arg.
You can use purrr::imap, which passes the names of the list as second argument to the function:
library(purrr)
imap(my_list, ~ my_function(..2, ..1))
# or imap(my_list, ~ my_function(.y, .x))
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"
And if you switch the arguments of your function, Book and Chapter, you can simply do:
my_function <- function(Chapter, Book) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
path
}
imap(my_list, my_function)
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"

Resources