I have trouble with an aggregation involving an ObjectId. This are the pipelines:
{'$match' : {'likes.id' : ObjectId('50e99acfb35de75402002023')}}
{'$project' : {'likes.id' : 1, '_id' : 0}}
{'$unwind' : '$likes'}
{'$group' : {'_id' : '$likes.id', 'count' : {'$sum':1}}}
{'$sort' : {'_id' : 1}}
My attempt to write it in R using rmongodb is:
pipe_1 <- mongo.bson.from.JSON('{"$match" : {"likes.id" : { "$oid" : "50e99acfb35de75402002023" }}}')
pipe_2 <- mongo.bson.from.JSON('{"$project" : {"likes.id" : 1, "_id" : 0}}')
pipe_3 <- mongo.bson.from.JSON('{"$unwind" : "$likes"}')
pipe_4 <- mongo.bson.from.JSON('{"$group" : {"_id" : "$likes.id", "count" : {"$sum":1}}}')
pipe_5 <- mongo.bson.from.JSON('{"$sort" : {"count" : 1}}')
pipes <- list(pipe_1,pipe_2,pipe_3,pipe_4,pipe_5)
result <- mongo.aggregation(mongo, ns = "analytics.analytics_profiles", pipeline =pipes)
Which returns
mongoDB error: 10
that corresponds with the BSON invalid error code.
I think the problem is with the match by ObjectId: the first pipeline alone gives the same error.
How can I fix this?
Extra: how can this be done using mongolite instead?
You really should not "dot notation" for a an "array" key in the aggregation pipeline, but what you are doing is still perfectly valid. However you can reduce the array elements to just "id" values with $project though:
Also looks like you might need to contruct your BSON for matching the ObjectId seperately:
oid <- mongo.oid.from.string("50e99acfb35de75402002023")
pipe_1 <- mongo.bson.from.list(list('$match' = list('likes.id' = oid)))
pipe_2 <- mongo.bson.from.JSON('{"$project" : {"likes" : "$likes.id", "_id" : 0}}')
pipe_3 <- mongo.bson.from.JSON('{"$unwind" : "$likes"}')
pipe_4 <- mongo.bson.from.list(list('$match' = list('likes' = oid)))
pipe_5 <- mongo.bson.from.JSON('{"$group" : {"_id" : "$likes", "count" : {"$sum":1}}}')
pipe_6 <- mongo.bson.from.JSON('{"$sort" : {"count" : 1}}')
That now makes "likes" an array of just values and not a "key/value" pair. So you don't need "$likes.id" in later stages. Just reference by "$likes".
--
For the record, I went through this with a sample document is a collection like what you seem to have defined:
{
"_id" : ObjectId("50e99acfb35de75402002023"),
"likes" : [
{
"id" : ObjectId("50e99acfb35de75402002023")
},
{
"id" : ObjectId("50e99acfb35de75402002023")
},
{
"id" : ObjectId("50e99acfb35de75402002023")
},
{
"id" : ObjectId("50e99acfb35de75402002023")
}
]
}
Then I actually defined the pipeline in R using the bson.from.list` contructors like so:
pipeline <- list(
mongo.bson.from.list(list(
'$match' = list(
'likes.id' = mongo.oid.from.string("50e99acfb35de75402002023")
)
)),
mongo.bson.from.list(list(
'$project' = list(
'_id' = 0,
'likes' = '$likes.id'
)
)),
mongo.bson.from.list(list(
'$unwind' = '$likes'
)),
mongo.bson.from.list(list(
'$match' = list(
'likes' = mongo.oid.from.string("50e99acfb35de75402002023")
)
)),
mongo.bson.from.list(list(
'$group' = list(
'_id' = '$likes',
'count' = list( '$sum' = 1 )
)
)),
mongo.bson.from.list(list(
'$sort' = list( 'count' = 1 )
))
)
mongo.aggregation(mongo, "test.posts", pipeline)
And for me that correctly adds all matching entries within the array.
Also "note" the additional match stage here after $unwind. The first $match in aggregation matches the "document", but this does nothing to "filter" the array content, so items in the array still contain things that do not match the "id" value you asked for.
So after processing $unwind you need to "filter" with $match again once the array has been denormalized. There are actually more efficient ways of doing this and they are well documented on this site even: Retrieve only the queried element in an object array in MongoDB collection
But you should also really be using the bson.from.list and general list() contructors for the structure rather than converting from JSON.
Related
Problem
I am working on project with RC various custom made classes. I would like to save certain class instances at the end of my program in order to export them to a data base. I tried using jsonlite::toJSON(account) but I received the error message
Error: No method for S4 class:BankAccount
Class
I have the following class
BankAccount <- setRefClass('BankAccount',
fields = list(
balance = 'numeric',
ledger = 'data.frame'
),
methods = list(
deposit = function (x) {
x <- max(x,0)
balance <<- balance + x
ledger <<- data.frame(
Date = c(ledger$Date, as.character(Sys.time())),
Type = c(ledger$Type, 'Deposit'),
Amount = c(ledger$Amount, x),
stringsAsFactors = FALSE
)
},
withdraw = function (x) {
x <- max(x,0)
balance <<- balance - x
ledger <<- data.frame(
Date = c(ledger$Date, as.character(Sys.time())),
Type = c(ledger$Type, 'Withdrawal'),
Amount = c(ledger$Amount, x),
stringsAsFactors = FALSE
)
}
))
Instance
And here is an instance of that class
account <- BankAccount$new(balance = 100)
account$deposit(1000)
Sys.sleep(5)
account$withdraw(97.89)
account
Reference class object of class "BankAccount"
Field "balance":
[1] 1002.11
Field "ledger":
Date Type Amount
1 2018-12-31 16:21:20 Deposit 1000.00
2 2018-12-31 16:21:26 Withdrawal 97.89
JSON
Now I would like to save it to as a JSON file of the form (there might be a typo in the JSON - not that familiar with the format)
{
"balance": "double",
"ledger": {
"Date": "string",
"Type": "string",
"Amount": "double"
}
}
PS
I also tried without the field ledger (which is of class data.frame) but it still did not work.
Edit
Here is the output of jsonlite::serializeJSON(account)
{"type":"S4","attributes":{".xData":{"type":"environment","attributes":{},"value":{}}},"value":{"class":"BankAccount","package":".GlobalEnv"}}
As you can see it seems to only save information about the class BankAccount but not about the instance account (balance value etc. missing).
Here is a workaround (as indicated by #StefanF). One can add methods to classes
# converts to JSON
toJSON = function (prettifyy = TRUE) {
instanceJSON <- jsonlite::toJSON(list(balance = balance,ledger = ledger))
if (prettifyy) instanceJSON <- jsonlite::prettify(instanceJSON)
instanceJSON
}
# saves as .json, e.g. path = 'C:/Test/instance.json'
saveJSON = function (path) {
instanceJSON <- toJSON()
writeLines(instanceJSON, path)
}
The solution is not ideal as
there is a bit of labor as one needs to specify which of the fields should be incorporated
also, if a field of BankAccount is of class MyClass (another custom class), then you need to either specify which fields of MyClass are of relevance or create a toJSON as well for MyClass
How can I sort/order R6 objects based on an own function value or a compare function value?
I have made up a little example with rectangles that I would like to sort by their area:
library('R6')
Rectangle <- R6Class(
"Rectangle",
public = list(
initialize = function(width, height) {
private$width = width
private$height = height
},
get_area = function(){
private$width*private$height
}
),
private = list(
width = NULL,
height = NULL
)
)
array_of_rects = c( Rectangle$new(7,3), Rectangle$new(5,2), Rectangle$new(3,4))
I would like to sort array_of_rects by their area given by the get_area() function.
I tried different things like:
`>.Rectangle` <- function(e1, e2) { e1[[1]]$get_area() > e2[[1]]$get_area() }
`==.Rectangle` <- function(e1, e2) { e1[[1]]$get_area() == e2[[1]]$get_area() }
sort(array_of_rects)
but without luck (I get an 'x' must be atomic error message).
I tried without the [[1]] (like this e1$get_area()) but this didn't work either.
Searched around but haven't found anything leading me to a solution.
Any suggestions? Thanks in advance!
Well, inspired by https://stackoverflow.com/a/23647092/1935801
I found the following nice and elegant solution
area = function(rect){ rect$get_area() }
sorted_rects = array_of_rects[ order( sapply(array_of_rects, FUN = area) ) ]
At the end of the day works with R6 like with any other class/object.
I've been working on an R package that is just a REST API wrapper for a graph database. I have a function createNode that returns an object with class node and entity:
# Connect to the db.
graph = startGraph("http://localhost:7474/db/data/")
# Create two nodes in the db.
alice = createNode(graph, name = "Alice")
bob = createNode(graph, name = "Bob")
> class(alice)
[1] "node" "entity"
> class(bob)
[1] "node" "entity"
I have another function, createRel, that creates a relationship between two nodes in the database. It is specified as follows:
createRel = function(fromNode, type, toNode, ...) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, ...) {
params = list(...)
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
The ... allows the user to add an arbitrary amount of properties to the relationship in the form key = value. For example,
rel = createRel(alice, "KNOWS", bob, since = 2000, through = "Work")
This creates an (Alice)-[KNOWS]->(Bob) relationship in the db, with the properties since and through and their respective values. However, if a user specifies properties with keys from or to in the ... argument, R gets confused about the classes of fromNode and toNode.
Specifying a property with key from creates confusion about the class of fromNode. It is using createRel.default:
> createRel(alice, "KNOWS", bob, from = "Work")
Error in createRel.default(alice, "KNOWS", bob, from = "Work") :
Invalid object. Must supply node object.
3 stop("Invalid object. Must supply node object.")
2 createRel.default(alice, "KNOWS", bob, from = "Work")
1 createRel(alice, "KNOWS", bob, from = "Work")
Similarly, if a user specifies a property with key to, there is confusion about the class of toNode, and stops at the stopifnot():
Error: "node" %in% class(toNode) is not TRUE
4 stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
3 stopifnot("node" %in% class(toNode))
2 createRel.node(alice, "KNOWS", bob, to = "Something")
1 createRel(alice, "KNOWS", bob, to = "Something")
I've found that explicitly setting the parameters in createRel works fine:
rel = createRel(fromNode = alice,
type = "KNOWS",
toNode = bob,
from = "Work",
to = "Something")
# OK
But I am wondering how I need to edit my createRel function so that the following syntax will work without error:
rel = createRel(alice, "KNOWS", bob, from = "Work", to = "Something")
# Errors galore.
The GitHub user who opened the issue mentioned it is most likely a conflict with setAs on dispatch, which has arguments called from and to. One solution is to get rid of ... and change createRel to the following:
createRel = function(fromNode, type, toNode, params = list()) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, params = list()) {
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
But, I wanted to see if I had any other options before making this change.
Not really an answer, but...
The problem is that the user-provided argument 'from' is being (partially) matched to the formal argument 'fromNode'.
f = function(fromNode, ...) fromNode
f(1, from=2)
## [1] 2
The rules are outlined in section 4.3.2 of RShowDoc('R-lang'), where named arguments are exact matched, then partial matched, and then unnamed arguments are assigned by position.
It's hard to know how to enforce exact matching, other than using single-letter argument names! Actually, for a generic this might not be as trite as it sounds -- x is a pretty generic variable name. If 'from' and 'to' were common arguments to ... you could change the argument list to "fromNode, , ..., from, to", check for missing(from) in the body of the function, and act accordingly; I don't think this would be pleasant, and the user would invariable provide an argument 'fro'.
While enforcing exact matching (and errors, via warn=2) by setting global options() might be helpful in debugging (though by then you'd probably know what you were looking for!) it doesn't help the package author who is trying to write code to work for users in general.
It might be reasonable to ask on the R-devel mailing list whether it might be time for this behavior to be changed (on the 'several releases' time scale); partial matching probably dates as a 'convenience' from the days before tab completion.
For example:
Mycls = setRefClass(
"Mycls",
fields = list(
# this is just a mock up
colorvec = "numeric" | "factor" | "matrix"
)
)
In this example, I want to allow colorvec to be numeric or factor or matrix.
Is there a way to do this?
Three possibilities.
Use the ANY type.
m2 = setRefClass("m2",
fields = list(x="ANY")
)
which, as the name, suggests allows you to have any type.
Create another class that only accepts numerics/factors/matrices:
setClass("mult", representation(x="ANY"))
setValidity("mult",
function(object)
is.numeric(object#x) || is.factor(object#x) || is.matrix(object#x)
)
m3 = setRefClass("m3", fields = list(x="mult"))
So
bit = new("mult", x=10)
m3$new(x=bit)
Have a function as your input and check the types. Notice that the x field doesn't actually store any data, it just checks and returns the internal value. You could create a simple show method to hide the internal field.
m4 = setRefClass("m4",
fields=list(x = function(y){
if(!missing(y) && !is.null(y)) {
if(!(is.numeric(y))){
stop("Wrong type")
}
internal <<- y
} else internal}
,
internal="ANY"
))
m4$new(x=10)
m4$new(x="10")
I'm a huge fan of S4 Reference Classes as they allow for a hybrid programming style (functional/pass-by-value vs. oop/pass-by-reference; example) and thus increase flexibility dramatically.
However, I think I just came across an undesired behavior with respect to the way R scans through environments/frames when you ask it to retrieve a certain field value via method $field() (see help page). The problem is that R also seems to look in enclosing environments/frames if the desired field is not found in the actual local/target environment (which would be the environment making up the S4 Reference Class), i.e. it's just like running get(<objname>, inherits=TRUE) (see help page).
Actual question
In order to have R just look in the local/target environment, I was thinking something like $field(name="<fieldname>", inherits=FALSE) but $field() doesn't have a ... argument that would allow me to pass inherits=FALSE along to get() (which I'm guessing is called somewhere along the way). Is there a workaround to this?
Code Example
For those interested in more details: here's a little code example illustrating the behavior
setRefClass("A", fields=list(a="character"))
x <- getRefClass("A")$new(a="a")
There is a field a in class A, so it's found in the target environment and the value is returned:
> x$field("a")
[1] "a"
Things look differently if we try to access a field that is not a field of the reference class but happens to have a name identical to that of some other object in the workspace/searchpath (in this case "lm"):
require("MASS")
> x$field("lm")
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
ret.x <- x
ret.y <- y
[omitted]
if (!qr)
z$qr <- NULL
z
}
<bytecode: 0x02e6b654>
<environment: namespace:stats>
Not really what I would expect at this point. IMHO an error or at least a warning would be much better. Or opening method $field() for arguments that can be passed along to other functions via .... I'm guessing somewhere along the way get() is called when calling $field(), so something like this could prevent the above behavior from occurring:
x$field("digest", inherits=FALSE)
Workaround: own proposal
This should do the trick, but maybe there's something more elegant that doesn't involve the specification of a new method on top of $field():
setRefClass("A", fields=list(a="character"),
methods=list(
myField=function(name, ...) {
# VALIDATE NAME //
if (!name %in% names(getRefClass(class(.self))$fields())) {
stop(paste0("Invalid field name: '", name, "'"))
}
# //
.self$field(name=name)
}
)
)
x <- getRefClass("A")$new(a="a")
> x$myField("a")
[1] "a"
> x$myField("lm")
Error in x$myField("lm") : Invalid field name: 'lm'
The default field() method can be replaced with your own. So adding an inherits argument to avoid the enclosing frames is simply a matter of grabbing the existing x$field definition and adding it...
setRefClass( Class="B",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
assign(name, value, envir = .self)
}
}
),
)
Or you could have a nice error message with a little rearranging
setRefClass( Class="C",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) &&
( !missing(value) || inherits==FALSE) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
assign(name, value, envir = .self)
}
}
),
)
Since you can define any of your own methods to replace the defaults pretty much any logic you want can be implemented for your refclasses. Perhaps an error if the variable is acquired using inheritance but the mode matches to c("expression", "name", "symbol", "function") and warning if it doesn't directly match the local refClass field names?