Cassandra Database is not connecting with R via Rcassandra

Cassandra Database is not connecting with R via Rcassandra - r

When I'm connecting to Cassandra database using RCassandra package, connection is establishing. But When trying to use any keyspace, R is not responding.I used the following statements.
library(RCassandra)
rc <- RC.connect(host ="localhost", port = 9042)
RC.use(rc, "db1", cache.def = TRUE)
Any sugestions Please

Your problem is that you're specifying the port directly, and you're using the port of the native protocol, while RCassandra uses thrift protocol (that uses port 9160), so when it's talking to port 9042, it simply don't understand what it says. So you need to either remove port argument completely, or specify it as 9160, and make sure that you have start_rpc parameter set to true in the cassandra.yaml.
I've looked into source code of the RCassandra, and see that it wasn't updated for more than 5 years. And as it uses Thrift instead of native protocol, then you have many limitations comparing to use of native protocol. And support for Thrift will be removed in the next major version of Cassandra - 4.0. The better alternative will be to write a wrapper around DataStax C/C++ driver, and expose underlying functionality to R.

Related

GremlinClient management

What is the Best Practice for managing the GremlinClient object in C#? Is it better to create a SingleInstance (via Dependency Injection) or using Dispose on the object after each call or with a using
using (var client = new GremlinClient(...))
{
var results = client.SubmitAsync(query);
}
Since its a Sockets connection to the server, I assumed that reusing the client was the best practice, but I've been getting this error and I haven't been able to determine the root cause.
Unable to read data from the transport connection:
An existing connection was forcibly closed by the remote host.

The recommendation is to use just one GremlinClient object and to reuse that across your application because the client uses a connection pool so the same connection can be used again for different requests instead of having to create and tear down one connection for each request. The driver also supports request pipelining since version 3.4.0 which means that the same connection can be used for different requests in parallel which reduces the number of required connections. That can of course also only work if you reuse the client.
The problem you describe however could be caused by a bug in the Gremlin.Net driver. If you are using version 3.4.0 of Gremlin.Net, then it could be this bug that can lead to a huge number of created connections. In that case, you can avoid it by downgrading to a 3.3.5 until we release a fix with version 3.4.1. If you also see the problem in version 3.3.5, then please create an issue in TinkerPop's issue tracker to describe the problem.
There could of course also be another reason for the server to close the connection. That would need to be investigated in more depth to understand the reason, e.g., with a tool like Wireshark by inspecting the network traffic between the driver and Cosmos DB.

Is encryption at rest supported on remote protocol in OrientDB?

In the documentation of OrientDB it mentioned that encryption at rest is not supported on remote protocol yet. It can be used only with plocal.
Currently we are using the OrientDB version 2.2.22. Database encryption is mandatory for us. We were previously using OrientDB in plocal mode, but now we have a new requirement in which multiple processes from different JVMs need to connect with same OrientDB database, which is not possible in plocal model.
Is there any way we can achieve it? Is there any workaround? Is this feature going to be supported in upcoming releases?

If you start your server and provide the key at startup, from that point on, the database is accessible via remote. So it would work. I suggest encrypting the TCP/IP connection too at that point.

No, it cannot currently be done:
NOTE: Encryption at rest is not supported on remote protocol yet. It can be used only with plocal.
Given your new requirements, it seems like OrientDB is not the right choice for you anymore.

How can i perform normal R-functions for hadoop remote on SQL Server?

how can I perform normal R-Code on a SQL Server without using the Microsoft rx-functions? I think the ComputeContext "RxInSqlServer" isn't the right one? But I couldn't find good Information about the other ComputeContext-options.
Is this possible with this Statement?
rxSetComputeContext(ComputeContext)
Or can I only use it to perform rx-functions? An other Option could be to set the Server Connection in RStudio or VisualStudio?
My Problem is: I want analyse data from hadoop via ODBC-Connection on the SQL Server, so I would like to use the performance of the remote SQL Server and not the data in SQL Server. And then I want analyse the hadoop-data with sparklyr.
Summary: I want to use the performance from the remote server and not the SQL Server data. So RStudio should run not local, it should perform and use the memory of the remote server.
Thanks!

The concept of a compute context in Microsoft R Server is, “Where will the computation be performed?”
When setting compute context, you are telling Microsoft R Server that computation will occur on either the local machine (with either “local” or “localpar” compute contexts), or, the script will be executed on a remote machine which has Microsoft R Server installed on it. Remote compute contexts are defined by creating a compute context object, and then setting the context to that object.
For SQL Server, you would create an RxInSqlServer() object, and then call rxSetComputeContext() on that object. For Hadoop, the object would be created via the RxHadoopMR() call.
In code, it would look something like:
CC <- RxHadoopMR( < context defined here > )
rxSetComputeContext(CC)
To see usage on defining a context, please see documentation (Enter "?RxHadoopMR" in the R Client, no quotes).
Any call to an "rx" function after this will be performed on the Hadoop cluster, with no data being transferred to the client; other than the results.
RxInSqlServer() would follow the same pattern.
Note: To perform any remote computation, Microsoft R Server must be installed on that machine.
If you wish to run a standard R function on a remote compute context, you must wrap that function in a call to rxExec(). rxExec() is desinged as an interface to parallelize any Open Source R function and allow for its execution on a remote context. Please see documentation (enter "?rxExec" in the R Client, no quotes) for usage.
For information on efficient parallelization, please see this blog: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/11/14/performance-optimization-when-using-rxexec-to-parallelize-algorithms/
You called out "without using the Microsoft rx-functions" and I am interpreting this as, "I would like to use Open Source R Algorithms on data in-SQL Server", with Microsoft R Server, you must use rxExec() as the interface to run Open Source R. If you want to use no rx functions at all, you will need to query the data to your local machine, and then use Open Source R. To interface with a remote context using Microsoft R Server, the bare minimum is using rxExec().
This is how you will be able to achieve the first part of your ask, "how can I perform normal R-Code on a SQL Server without using the Microsoft rx-functions? I think the ComputeContext "RxInSqlServer" isn't the right one?"
For your second ask, "My Problem is: I want analyse data from hadoop via ODBC-Connection on the SQL Server, so I would like to use the performance of the remote SQL Server and not the data in SQL Server. And then I want analyse the hadoop-data with sparklyr."
First, I'd like to comment that with the release of Microsoft R Server 9.1, you can use sparklyr in-line with an MRS Spark connection, for some examples, please see this blog: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/04/19/new-features-in-9-1-microsoft-r-server-with-sparklyr-interoperability/
Secondly, what you are trying to do is very involved. I can think of two ways that this is possible.
One is, if you have SQL Server PolyBase, you can configure SQL Server to make a virtual table referencing data in Hadoop, similar to Hive. After you have referenced your Hadoop data in SQl Server, you would use an RxInSqlServer() compute context on these tables. This would analyse the data in SQL Server, and return the results to the client.
Here is a detailed blog explaining an end-to-end setup on Cloudera and SQL Server: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/10/17/integrating-polybase-with-cloudera-using-active-directory-authentication/
The Second, which I would NOT recommend, is untested, hacky, and has the following prereqs:
1) Your Hadoop cluster must have OpenSSH installed and configured
2) Your SQL Server Machine must have the ability to SSH into your Hadoop Cluster
3) You must be able to place an SSH Key on your SQL Server machine in a directory which the R Services process has the ability to access
And I need to add another disclaimer here, there is No Guarantee of this working, and, likely, it will not work. The software was not designed to operate in this fashion.
You would then do the following:
On your client machine, you would define a custom function which contains the analysis that you wish to perform, this can be Open Source R Function, rx functions, or a mix.
In this custom function, before calling any other R or rx functions, you would define a RxHadoopMR compute context object which points to your cluster, referencing the SSH key in the directory on the SQL Server machine as if you were executing from that machine. (in the same way that you would define the RxHadoopMR object if you were to do a remote Hadoop operation from your client machine).
Within this custom function, immediately after RxHadoopMR() is defined, you would call rxSetComputeContext() on your defined RxHadoopMR() object
Still in this custom function, write the actual script which will operate on the data in Hadoop.
After this function is defined, you would define an RxInSqlServer() compute context object on the client machine.
You would set your compute context to RxInSqlServer()
Then you would call rxExec() with your custom function as an input.
What this will do is execute your custom function on the SQL Server machine, which would hopefully cause it to define its compute context as your Hadoop cluster, and pull the data over SSH for analysis on the SQL Server machine; returning the results to client.
With that said, this is not how Microsoft R Server was designed to be used, and if you wish to optimize performance, please use Option One and configure PolyBase.

Is it possible to use single table for multiple connections in cassandra

Is it possible to use single table for multiple connections in cassandra?
I am using a spring data for java connection and am trying to use python flask for raspberry connection.

Yes, you can use single tale for multiple connections. I'm using Cassandra 2.2 where I connect using Java Astyanax driver and python pycassa client.
Are you referring about table lock for transactions? Cassandra does not use any locking mechanism on data, the below docs link describes more on transactions and consistency in Cassandra:
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_about_transactions_c.html
Mention what you tried and problems you faced.

Creating a GPRS Connection using Windows Mobile 6.5

I am writing an application in which I need to make a GPRS connection.
Can someone please help me how I can setup this connection using C#.
thanks
EDIT
I will need to connect a GPRS connection because I need to call a webservice.

To make a GPRS connect using a dial up connection subsystem from .NET CF on PocketPC, you can use Connection Manager functions such as ConnMgrEnumDestinations, ConnMgrEstablishConnection and ConnMgrReleaseConnection. Check out http://msdn.microsoft.com/library/default.asp?url=/library/en-us/APISP/html/sp_cnmn_connection_manager.asp for details. There are no classes available for this in .NET CF, but it can be done quite easily using P/Invoke if you have some experience with this.
The following blog post covers this in detail and also contains some C# code: http://blogs.msdn.com/anthonywong/archive/2006/03/13/550686.aspx.
Another solution is using the Smart Device Framework by OpenNETCF.org that contains a wrapper class for the Connection Manager:
http://www.opennetcf.org/downloads/bin/SmartDeviceFramework14.zip
It is free for any commercial or noncommercial purpose up to the version 1.4. It also includes the source code, so you might either use it as is or as a reference for your own implementation if you prefer.
(solution taken from our website at http://forum.rebex.net/questions/503/how-to-establish-a-gprs-connection-for-ftp-use-on-net-cf)

This page might help...
http://msdn.microsoft.com/en-us/library/bb840031.aspx

Is it really important to explicitly create that connection? If you initiate any outgoing (eg. not localhost) connection (like a HttpWebRequest), the OS will automatically connect to the Internet using the preferred connection, which can be GPRS.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Cassandra Database is not connecting with R via Rcassandra - r

Related

GremlinClient management

Is encryption at rest supported on remote protocol in OrientDB?

How can i perform normal R-functions for hadoop remote on SQL Server?

Is it possible to use single table for multiple connections in cassandra

Creating a GPRS Connection using Windows Mobile 6.5

Categories

Resources