Kafakacomsumer for R fetching - r

I am working on kafka . I have created kafka producer on my server . I want to get data from kafkaproducer to my local system in r.
I have tried following code in R:
library(rkafka)
consumer1<-rkafka.createConsumer("ipaddress:9092","mytest")
consumer11 <- rkafka.read(consumer1)
It throws following error:
[1] "Java-Object{com.musigma.consumer.MuConsumer#3349e9bb}"
Unable to connect to zookeeper server
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within
timeout: 100000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:156)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:114)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:65)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:67)
at kafka.consumer.Consumer$.createJavaConsumerConnector(ConsumerConnector.scala:100)
at kafka.consumer.Consumer.createJavaConsumerConnector(ConsumerConnector.scala)
at com.musigma.consumer.MuConsumer.CreateConsumer(MuConsumer.java:99)
java.lang.NullPointerException
at com.musigma.consumer.MuConsumer.startConsumer(MuConsumer.java:133)
My zookeeper is running on the ipaddress successfully.

The first parameter is Zookeeper, which runs on port 2181
You've given it Kafka port
Source - https://github.com/cran/rkafkajars/blob/master/java/com/musigma/consumer/MuConsumer.java#L87
Note: Looks like that library isn't maintained and using Zookeeper to connect with a consumer is practically deprecated, so maybe try finding another library

Related

How to make a client / server connection using Rserver and Windows Server 2008

I am searching for a robust solution to perform extensive computations on a remote server, dedicated to computational tasks. The server is on Windows 2008 R2 and has R x64 3.4.1 installed on it. I've searched for free solutions and am now focusing on the Rserver/RSclient packages solutions.
However, I can't connect any client (using RSclient) to the instanced server.
This is how I'm proceeding at the moment from the server side:
library(Rserve)
run.Rserve(config.file = "Rserv.conf")
using the following Rserv.conf file:
port 6311
remote enable
plaintext enable
control enable
r-control enable
The server is now intanciated using the Rsession (It's a bit ugly, but will change that latter on):
running Rserve in this R session (pid=...), 1 server(s)
Now, i'm trying to connect using a remote computer (Client-side) using:
library(RSclient)
c = RS.connect(host = "...")
The connection then seems to succeed, checking for c:
> c
Rserve QAP1 connection 0x000000000fbe9f50 (socket 764, queue length 0)
The error occurs when i try to eval anything, for example:
> RS.server.eval(c,"0<1")
Error in RS.server.eval(c, "0<1") : command failed with status code 0x4e: no control line present (control commands disabled or server shutdown)
I've read the available guides but still failed in connecting. What is wrong? It seems to be related to control lines but I authorized them in the config file.
for me the problem was solved by initiating the Rserve instance with the command:
R CMD Rserve --RS-port 9000 --RS-enable-remote --RS-enable-control
instead of starting it in the R environment (library(Rserve), run.Rserve(config.file = "Rserv.conf")). You may try this on Windows as well.
Refer https://github.com/s-u/Rserve/wiki/rserve.conf.
port 6311
remote enable -> it should be remote true
plaintext enable
control enable
r-control enable
Likewise refer the link and try with actual values

Passing hostname to netty

Background: I've got two machines with identical hostnames, I need to set up a local spark cluster for testing, setting up a master and a worker works fine, but trying to run an application with the driver causes problems, netty doesn't seem to be picking the correct host (regardless of what I put in there, it just picks the first host).
Identical hostname:
$ dig +short corehost
192.168.0.100
192.168.0.101
Spark config (used by master and the local worker):
export SPARK_LOCAL_DIRS=/some/dir
export SPARK_LOCAL_IP=corehost // i tried various like 192.168.0.x for
export SPARK_MASTER_IP=corehost // local, master and the driver
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/some/dir
Spark starts up and I can see the worker in the web-ui.
When I run the spark "job" below:
val conf = new SparkConf().setAppName("AaA")
// tried 192.168.0.x and localhost
.setMaster("spark://corehost:7077")
val sc = new SparkContext(conf)
I get this exception:
15/04/02 12:34:04 INFO SparkContext: Running Spark version 1.3.0
15/04/02 12:34:04 WARN Utils: Your hostname, corehost resolves to a loopback address: 127.0.0.1; using 192.168.0.100 instead (on interface en1)
15/04/02 12:34:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/02 12:34:05 ERROR NettyTransport: failed to bind to corehost.home/192.168.0.101:0, shutting down Netty transport
...
Exception in thread "main" java.net.BindException: Failed to bind to: corehost.home/192.168.0.101:0: Service 'sparkDriver' failed after 16 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Process finished with exit code 1
Not sure how to proceed... its a whole jungle of ip addresses.
Not sure if this is a netty issue either.
My experience with the identical problem is that it revolves around setting things up locally. Try being more verbose in your spark driver code, add the SPARK_LOCAL_IP and driver host ip to the config:
val conf = new SparkConf().setAppName("AaA")
.setMaster("spark://localhost:7077")
.set("spark.local.ip","192.168.1.100")
.set("spark.driver.host","192.168.1.100")
This should tell netty which of the two identical hosts to use.

Jmap - Error connecting to remote debug server

My requirement is to create a dump file of heap memory of a remote server using Jmap.
I did this way.
jmap -dump:file=remoteDump.txt,format=b 3104
This worked fine as 3104 is the pid of a process from my local machine.
How do I do the same with remote server?
I tried
jmap -dump:file=remoteDump.txt,format=b 3104 54.197.228.33:8080
But it's failed.
I tried creating a debug server using jsadebugd, as below.
1.Started rmiregistry
rmiregistry -J-Xbootclasspath/p:$JAVA_HOME/lib/sa-jdi.jar
2.Ran jsadebugd
>jsadebugd 11594 54.197.228.33:9009
But the step 2 is throwing the following error:
Error attaching to process or starting server: sun.jvm.hotspot.debugger.D
Exception: Windbg Error: WaitForEvent failed!
at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.attach0(Na
thod)
at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.attach(Win
ggerLocal.java:152)
at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:
at sun.jvm.hotspot.HotSpotAgent.setupDebuggerWin32(HotSpotAgent.j
)
at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:3
at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:313)
at sun.jvm.hotspot.HotSpotAgent.startServer(HotSpotAgent.java:220
at sun.jvm.hotspot.DebugServer.run(DebugServer.java:106)
at sun.jvm.hotspot.DebugServer.main(DebugServer.java:45)
at sun.jvm.hotspot.jdi.SADebugServer.main(SADebugServer.java:55)
Help me get out of it.
The reason why you can not attach to process could be that it is already attached to some other debuger or executed on other visual machine than your jmap is running.
Try to assure that process is not attached to any debuger and you attach to the same VM.

Hive ODBC Driver DSN setup Issue

I was trying to setup ODBC connection for Hive. I followed the below steps but it didn't worked.
User DSN-->Add--> Hortonworks Hive ODBC Driver --> and I gave below details
Host : IP of the Primary name node cluster
Port:10001
Server Type : Hive Server 2
Authentication Mechanism : User Name --> hadoop
While testing the connection, it throws the following error
Error:
Driver Version: V1.2.13.1018
Running connectivity tests...
Attempting connection
Failed to establish connection
SQLSTATE: HY000[Hortonworks][Hardy] (34) Error from Hive: connect() failed: errno = 10061.
TESTS COMPLETED WITH ERROR.
Could you please tell me if the port I use is correct ? If not, what port should I try ? The port 10000 doesn't work either.
I am using HDP 2.0 on Windows 2012 R2 Server (Single Node Cluster). I Installed Hive ODBC Driver from Microsoft site. I gave my Host Name and Port :10001 and user as hive. When I installed HDP 2.0 in Win 2012 Server R2, I gave the Hive User Name as hive. I am able to connect successfully.
The answer of your problem is that first of all: check on your virtual machine that the port "10000" is added because it's not added by default.
If the port is there, you might check the hive Server if it's running from your virtual machine
I hope it will help.
under the mechanism changed it to user name only.

Impala 1.2.1 ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)

Using impala-shell, I can see the hive metastore, use any data base created by Hive and query any table created by Hive. When I try to create a table in impala-shell or do a "invalidate metadata", I get
"ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)"
Have following configuration. This is a multi-node cluster configuration * built by hand i.e. without using Cloudera Manager *
CentOS 6
CDH4.5
Impala 1.2.1
Hive MySQL Metastore
impalad are running on multiple nodes with data nodes
statestored and catalogd is running on a single node that is NOT impalad node
In /etc/default/impala I have changed IMPALA_STATE_STORE_HOST to point to IP of the statestored machine
From the /var/log/impala/catalogd.INFO, it seems 26000 is used by catalog service as there is a line in this file "--catalog_service_port=26000"
Just as /etc/default/impala has to tell Impalad where is the statestore (using IMPALA_STATE_STORE_HOST), I am wondering if for 1.2.1 (where catalogd is introduced) there has to be an additional entry for catalogd location as well - just a guess ....
Any help is appreciated.
Thanks,
you have to start the impalad with the option -catalog_service_host=fqdn_to_your_catalog_host.
unfortunately this is not yet in the default configuration so you have to add it yourself
change /etc/default/impala
CATALOG_SERVICE_HOST=fqdn_to_your_catalog_host
IMPALA_SERVER_ARGS=add: -catalog_service_host=${CATALOG_SERVICE_HOST}
restart impalad and it should work now :-)

Resources