mongolite best practices - r

I am developing an application using OpenCPU and R, I am totally new to both. I am using mongolite package to connect to MongoDB. There are multiple calls to the DB and connecting every time, takes really long. Plus data processing, plotting etc. takes quite a lot of time to load the page with the generated plot. In many cases, I have to plot fetching data from multiple collections.
I noticed that i am able to save 3-4 seconds (per connection) if I don't connect to DB each and every time, rather use an existing connection.
Will be great if anyone can guide me with the best way to check if connection is already established to the DB.
Let me brief you on what I have done so far!
Here is my connect_to_db.R file
library(mongolite)
dbConnection <- NULL
connect_mongodb = function() {
db={
if(is.null(dbConnection)){ # connect to DB only if connection is NULL, else return global connection object
m <- mongo(collection = myCollection, db = myDb, url = myUrl)
assign("dbConnection", m, envir = .GlobalEnv)
}
return(dbConnection)
}
}
It serves the purpose on sourcing the file and running it from R console. But, when I use it in my opencpu server, I make a call to connect_mongodb method from another R method that I use for plotting. I call the plotting method from a javascript file as follows.
var req = $("#plot").rplot(myPlottingMethod, options).fail(function(){
alert("Error loading results");
})
This way, my variable "dbConnection" is unknown to the method.
I tried few other ways of using <<- which i read isn't a good way to do it. Also I tried using exists() in place of is_null.
I tried another option of calling my connect_mongodb method from my javascript file using ocpu.rpc call with an idea of passing it as an argument to the R methods in rplot calls.
var req = ocpu.rpc("connect_mongodb", {})
Since connecting to mongolite doesnt return a JSON object, this attempt also failed with the below error
Failed to get JSON response for http://localhost:xxxx/ocpu/tmp/x07c82b16bb/
sadly, toJSON of jsonlite and rjson did not help in converting the db object to JSON

Related

In R, how to check if an object exists in memory from inside the object itself?

I've been running into this problem in a few separate cases now and I'd like your input. In R, objects can be deleted or overwritten, but if they use Rcpp libraries, they will keep doing what they do.
For example, when connecting to a websocket using the websocket package:
ws<-WebSocket$new(paste0(gate,"/?v=6&encoding=json"),autoConnect=F)
ws$onMessage(function(event) {
print(event)
})
ws$connect()
The object ws is now my only way to control the websocket, and if it is deleted or overwritten, there is no way to make it disconnect except to restart R.
A similar issue when using the later package:
BumpUp<-function(.self){
.self$iter<-.self$iter+1
message("Valued bumped up to ",.self$iter)
if(.self$iter<10){
later::later(~.self$bump(),delay=1)
}
}
MakeTestObject<-setRefClass("testobject",fields=list(iter="numeric"),methods=list(bump=BumpUp))
testobj<-MakeTestObject(iter=0)
testobj$bump()
rm(testobj)
the loop associated with testobj continues to repeat, despite the fact that the object itself has been removed from memory.
Is there any way to make a reference class object check if it still exists in memory? More generally, is it possible for the object to know its name in memory?

Generically forwarding a GRPC call

I have a GRPC API where, following a refactor, a few packages were renamed. This includes the package declaration in one of our proto files that defines the API. Something like this:
package foo;
service BazApi {
rpc FooEventStream(stream Ack) returns (stream FooEvent);
}
which was changed to
package bar;
service BazApi {
rpc FooEventStream(stream Ack) returns (stream FooEvent);
}
The server side is implemented using grpc-java with scala and monix on top.
This all works fine for clients that use the new proto files, but for old clients that were built on top of the old proto files, this causes problems: UNIMPLEMENTED: Method not found: foo.BazApi/FooEventStream.
The actual data format of the messages passed over the GRPC API has not changed, only the package.
Since we need to keep backwards compatibility, I've been looking into a way to make the old clients work while keeping the name change.
I was hoping to make this work with a generic ServerInterceptor which would be able to inspect an incoming call, see that it's from an old client (we have the client version in the headers) and redirect/forward it to the renamed service. (Since it's just the package name that changed, this is easy to figure out e.g. foo.BazApi/FooEventStream -> bar.BazApi/FooEventStream)
However, there doesn't seem to be an elegant way to do this. I think it's possible by starting a new ClientCall to the correct endpoint, and then handling the ServerCall within the interceptor by delegating to the ClientCall, but that will require a bunch of plumbing code to properly handle unary/clientStreaming/serverStreaming/bidiStreaming calls.
Is there a better way to do this?
If you can easily change the server, you can have it support both names simultaneously. You can consider a solution where you register your service twice, with two different descriptors.
Every service has a bindService() method that returns a ServerServiceDefinition. You can pass the definition to the server via the normal serverBuilder.addService().
So you could get the normal ServerServiceDefinition and then rewrite it to the new name and then register the new name.
BazApiImpl service = new BazApiImpl();
serverBuilder.addService(service); // register "bar"
ServerServiceDefinition barDef = service.bindService();
ServerServiceDefinition fooDefBuilder = ServerServiceDefinition.builder("foo.BazApi");
for (ServerMethodDefinition<?,?> barMethodDef : barDef.getMethods()) {
MethodDescriptor desc = barMethodDef.getMethodDescriptor();
String newName = desc.getFullMethodName().replace("foo.BazApi/", "bar.BazApi/");
desc = desc.toBuilder().setFullMethodName(newName).build();
foDefBuilder.addMethod(desc, barMethodDef.getServerCallHandler());
}
serverBuilder.addService(fooDefBuilder.build()); // register "foo"
Using the lower-level "channel" API you can make a proxy without too much work. You mainly just proxy events from a ServerCall.Listener to a ClientCall and the ClientCall.Listener to a ServerCall. You get to learn about the lower-level MethodDescriptor and the rarely-used HandlerRegistry. There's also some complexity to handle flow control (isReady() and request()).
I made an example a while back, but never spent the time to merge it to grpc-java itself. It is currently available on my random branch. You should be able to get it working just by changing localhost:8980 and by re-writing the MethodDescriptor passed to channel.newCall(...). Something akin to:
MethodDescriptor desc = serverCall.getMethodDescriptor();
if (desc.getFullMethodName().startsWith("foo.BazApi/")) {
String newName = desc.getFullMethodName().replace("foo.BazApi/", "bar.BazApi/");
desc = desc.toBuilder().setFullMethodName(newName).build();
}
ClientCall<ReqT, RespT> clientCall
= channel.newCall(desc, CallOptions.DEFAULT);

How does R handle closing of data base connections

If I create a data base connection within a function, the connection objects gets destroyed when the function finished executing. Does this reliably close the data base connection, or would it better to to close it manually first?
Why I need to know this:
I am working on a package that creates data base connections on the fly with either RODBC or RJDBC as backend. I designed my function interfaces so that you can pass in either username and password, or a connection object. Usually, when I pass in a connection object I do not want the connection to be closed on the termination of the function, while when I pass in username and password I want it to be closed.
If I do not have to worry about open connections, it would simplify things a lot for me and also save me a lot of headaches.
Answer & More:
I markded Benjamin's answer as the answer since it gives good advice, though actually what I was looking for is more Marek's comment that (paraphrased:) connections can be open after the connection object is destroyed and there is no way to access them from R any more.
I ended up going for a solution that involves creating an R6 class and defining finalize() method that closes the connection (it's a more powerful than on.exit()), but that is beyond the scope of this Question.
I write a lot of functions that create ODBC connections when they are called. My standard practice is:
conn <- RODBC::odbcDriverConnect(...)
on.exit(RODBC::odbcClose(conn))
By putting the creation of the object and the code for its closing next to each other, I know that the connection will be closed when the function is finished. Using on.exit has the added advantage of closing the connection even if the function stops on an error.
Edit:
In the problem as your edit has described it, I think the same pattern is relevant. You need to declare on.exit in a manner that it only gets called if you don't provide the connection object.
db_function <- function(conn = NULL, uid, pwd){
if (is.null(conn)){
conn <- RODBC::odbcDriverConnect(...) # Build conn with uid and pwd
on.exit(RODBC::odbcClose(conn))
}
}
A trivial example to show by-passing on.exit
test_fun <- function(on_exit = FALSE){
if (on_exit) on.exit(print("hello world"))
"Look at me"
}
test_fun()
test_fun(TRUE)

moq returning dataReader

I'm having a strange experience with moq/mocking.
Im trying to mock the data going into a method so that i dont have to have adatabase available at test time.
So im loading in some data ive previously seralised.
Loading it into a dataTable, then creating a data reader from there, because my business layer method expects a data reader.
Then creating a mock for my data layer. and setting the return value for a particular method to my new datareader.
I am then setting (injecting) my mock data layer into my business layer so it can do the work of returning the data when the time comes..
var dataTable = DataSerialisation.GetDataTable("C:\\data.xml");
IDataReader reader = dataTable.CreateDataReader();
var mock = new Mock<IRetailerDal>();
mock.Setup(x => x.ReadRetailerDetails("00")).Returns(reader);
retailersBusinessLayer.RetailerDal = mock.Object;
var r = retailersBusinessLayer.GetRetailerDetail("00");
Now.. when the "GetRetailerDetail" is called is basically gets to "while(data.Read())" and crashes out but only sometimes.
I get the exception:
System.InvalidOperationException : DataTableReader is invalid for current DataTable 'Table1'.
Othertimes it move past that and can read some columns data, but other columns dont exist. (which must be to do with my serialisation method)
Well, this isnt exactly a satisfactory answer, but the code works now..
its similar to this.. in that no reason was found.
here
Anyway... as stated above the issue was ocuring inside my GetRetailerDetail method, where the code hits while(data.Read()) it throws the error..
The fix.. change the name of the data reader variable.. i.e. its was "data" and its now "data2".. thats all i changed.

difficult syncronization problem with FLEX commands (in cairngorm)

My problem, simplified:
I have a dataGrid with a dataProvider "documents"
A column of the datagrid has a labelFunction that gets the project_id field of the document, and returns the project name, from a bindable variable "projects"
Now, I dispatch the events to download from the server the documents and the projects, but If the documents get downloaded before the projects, then the label function gives an error (no "projects" variable)
Therefore, I must serialize the commands being executed: the getDocuments command must execute only after the getProjects command.
In the real world, though, I have dozens of resources being downloaded, and those command are not always grouped together (so I can't for example execute the second command from the onSuccess() method of the first, because not always they must be executed together..)..
I need a simple solution.. I need an idea..
If I understand you correctly, you need to serialize the replies from the server. I have done that by using AsyncToken.
The approach: Before you call the remote function, add a "token" to it. For instance, an id. The reply from the server for that particular call will then include that token. That way you can keep several calls separate and create chains of remote calls.
It's quite cool actually:
service:RemoteObject;
// ..
var call:AsyncToken = service.theMethod.send();
call.myToken = "serialization id";
private function onResult(event:ResultEvent):void
{
// Fetch the serialization id and do something with it
var serId:String = event.token.myToken;
}

Resources