AMQP consumer loads CPU to 100% - symfony

I have Symfony2 application with RabbitMQBundle installed. I've setup consumers and producers as it's described in the bundle documentations and everything works correct. But my consumers started with ./app/console rabbitmq:consumer take all available CPU time. Basically consumer does nothing but waiting for a message and output it. If I start demo consumer from php-amqplib CPU consumption is almost zero. I tried different virsions of Symfony (2.6 and 2.3) but this does not affect CPU load. My server configuration:
Debian 7
PHP 5.6.4 (also tried 5.4)
no database used
RabbitMq 3.4.2
Is there any way to reduce CPU consumption? Thanks

Just ran into a very similar issue and after some debugging realized that I was using an old way of instantiating the connection to rabbitmq.
The new signature of the method is described here: https://github.com/videlalvaro/php-amqplib/blob/master/PhpAmqpLib/Connection/AbstractConnection.php#L136
I was sending in something that looked more like
$this->connection = new Connection\AMQPConnection(
$server->host,
$server->port,
$server->user,
$server->password,
$server->vhost,
$server->insist,
$server->login_method,
$server->locale,
$server->connection_timeout,
$server->read_write_timeout,
$server->context,
$server->keepalive,
$server->heartbeat
);
As per a very old definition around somewhere in version 2. https://github.com/videlalvaro/php-amqplib/blob/v2.0.0/PhpAmqpLib/Connection/AMQPConnection.php#L31
So your plugin seems to use a new version of the library but not the new way to initiate a connection.

Related

Am I missing something when using MassTransit and AmazonSQS in a large project?

I'm using MassTransit in a project with AmazonSQS and since I updated the packages to the latest version 7.3 I'm getting this exception
---> Amazon.SimpleNotificationService.AmazonSimpleNotificationServiceException: Rate exceeded
---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown.
Sometimes the exception is coming from SQS, the thing is when I was working with the version 6 I didn't have those exceptions.
This solution has three projects:
Two web applications (which produce the messages)
BackgroundService (which receive and process the messages)
I designed this system using CQRS pattern with several commands and for that reason it's creating 100 topics and I don't know if I need to consider some limits either from AWS or MassTransit
Someone can help me? Thanks

How can i configure a kafka batch consumer to retry a pre-defined no of times using SeekToCurrentBatchErrorHandler?

I'm using spring-kafka '2.2.7.RELEASE' to create a batch consumer and I'm trying to understand How can i configure a kafka batch consumer to retry a pre-defined no of times using SeekToCurrentBatchErrorHandler?
I see the one of the SeekToCurrentErrorHandler constructors takes 'maxFailures' as an argument but I don't see any such option for SeekToCurrentBatchErrorHandler. Please suggest.
2.2.x is no longer supported.
See the documentation for the reasons why recovery after some number of failures is not supported with batch listeners and older versions of the framework.
You can use the RetryingBatchErrorHandler (since 2.3.7) or RecoveringBatchErrorHandler (since 2.5.0) instead.

Could not create internal topics - Stream-thread exception

I am trying to execute a simple Wordcount stream application but I face the error "Could not create internal topics - Stream-thread exception"
I have seen a similar thread but that seems to be more of a network issue.
Here is no security enabled on the kafka broker.
Only one broker is configured and still this issue.
Can someone let me know how to fix this?
Clean your temporary kafka queues.
Run --list command on kafka to see all the queues starting with your names and ending with -changelog & -repartition and manually run delete on them.
This one worked for me.
Also, check your settings on delete.topic.enable for actual deletion happening. It was not the default setting until 1.0.0 - see https://issues.apache.org/jira/browse/KAFKA-5384
i have connected to kafka using kafka tool and delete them manually

Running Apache spark job from Spring Web application using Yarn client or any alternate way

I have recently started using spark and I want to run spark job from Spring web application.
I have a situation where I am running web application in Tomcat server using Spring boot.My web application receives a REST web service request based on that It needs to trigger spark calculation job in Yarn cluster. Since my job can take longer to run and can access data from HDFS, so I want to run the spark job in yarn-cluster mode and I don't want to keep spark context alive in my web layer. One other reason for this is my application is multi tenant so each tenant can run it's own job, so in yarn-cluster mode each tenant's job can start it's own driver and run in it's own spark cluster. In web app JVM, I assume I can't run multiple spark context in one JVM.
I want to trigger spark jobs in yarn-cluster mode from java program in the my web application. what is the best way to achieve this. I am exploring various options and looking your guidance on which one is best
1) I can use spark-submit command line shell to submit my jobs. But to trigger it from my web application I need to use either Java ProcessBuilder api or some package built on java ProcessBuilder. This has 2 issues. First it doesn't sound like a clean way of doing it. I should have a programatic way of triggering my spark applications. Second problem will be I will loose the capability of monitoring the submitted application and getting it's status.. Only crude way of doing it is reading the output stream of spark-submit shell, which again doesn't sound like good approach.
2) I tried using Yarn client to submit the job from spring application. Following is the code that I use to submit spark job using Yarn Client:
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf conf = new SparkConf();
ClientArguments cArgs = new ClientArguments(sparkArgs, conf);
Client client = new Client(cArgs, config, conf);
client.run();
But when I run the above code, it tries to connect on localhost only. I get this error:
5/08/05 14:06:10 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/08/05 14:06:12 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So I don't think it can connect to remote machine.
Please suggest, what is best way of doing this with latest version of spark. Later I have plans to deploy this entire application in amazon EMR. So approach should work there also.
Thanks in advance
Spark JobServer might help:https://github.com/spark-jobserver/spark-jobserver, this project receives RESTful web requests and start a spark job. Results is returned as json response.
I also had similar issues trying to run Spark app that connects to YARN cluster - having no cluster config it was trying to connect to the local machine as for the main node of the cluster, which obviously failed.
It worked for me when I've placed core-site.xml and yarn-site.xml into the classpath (src/main/resources in typical sbt or Maven project structure) - application correctly connected to the cluster.
When using spark-submit location of those files is typically specified by HADOOP_CONF_DIR environment variable, but for stand-alone application it didn't have effect.

how to use the example of scrapy-redis

I have read the example of scrapy-redis but still don't quite understand how to use it.
I have run the spider named dmoz and it works well. But when I start another spider named mycrawler_redis it just got nothing.
Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.
And if the spiders on different machines want to share the same request queue, how can I get it done? It seems that I should firstly make the slave machine connect to the master machine's redis, but I'm not sure which part to put the relative code in,in the spider.py or I just type it in the command line?
I'm quite new to scrapy-redis and any help would be appreciated !
If the example spider is working and your custom one isn't, there must be something that you have done wrong. Update your question with the code, including all relevant parts, so we can see what went wrong.
Besides I'm quite confused about how the request queue is set. I
didn't find any piece of code in the example-project which illustrate
the request queue setting.
As far as your spider is concerned, this is done by appropriate project settings, for example if you want FIFO:
# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
As far as the implementation goes, queuing is done via RedisSpider which you must inherit from your spider. You can find the code for enqueuing requests here: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73
As for the connection, you don't need to manually connect to the redis machine, you just specify the host and port information in the settings:
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
And the connection is configured in the Ä‹onnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py
The example of usage can be found in several places: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17

Resources