I'm doing some synthetic testing of Kafka. My goal is to learn what maximum throughput I can achieve in my specific setup. The problem I'm getting is that after increasing throughput to a certain level (by gradually starting more producer containers) and sustaining that throughput level for 10-15 seconds individual producer container throughput slows downs (drops from 16K mps to 8-12K mps, consumed message rate reduces in line with the producer rate) and one of the two things happen:
either producers fail with Local: Queue full error
or Kafka container crashes.
While message rate is sustained, Kafka memory consumption slowly grows from 500Mb to 1.5Gb. Kafka instance CPU usage (as reported by docker stats command is around 60-70%, which I believe translates to 0.6-0.7 CPU in Docker configuration terms). Producer and consumer memory and CPU load is uneventful.
I've also noticed that reducing message size allows to sustain the message rate longer, but the next rate increase leads to the same symptoms.
My initial suspicions were related to amounts of memory available to Kafka (not sure how to validate this suspicion as Kafka is not logging any exception before crashing) so I have significantly reduced retention.ms from 300K to 5K in the hope that this would reduce amount of memory needed by Kafka to maintain the message rate, but this has not helped.
What could be causing the issue? Any steps to help debug the issue are also highly appreciated!
The setup looks like this:
Host machine:
CPU: 8 Cores, 16 threads
RAM: 64 Gb
OS: PopOS
1 Kafka container
2 CPU (docker-compose deploy > resources > limits > cpus setting)
2 Gb RAM (docker-compose deploy > resources > limits > memory, memswap_limit and Kafka -Xmx and -Xms settings)
1 Topic, 40 partitions, retention.ms = 5000
4 Producer containers (.Net Confluent.Kafka package)
0.5 CPU
2 Gb RAM
1 producer instance per container
10 threads per producer
Thread message rate: 1600 per second
Batch size: 128
LingerMs: 40
Combined produced message rate: 4 x 10 x 1600 = 64K mps
Message size: 1Kb
Using ProduceAsync API method
Considering message sent when async DeliveryResult task returned by ProduceAsync has been completed
2 Consumer containers (.Net Confluent.Kafka package)
1 CPU
2 Gb RAM
1 consumer instance per container
20 threads per consumer
all consumer belong to a single group (so each message is consumed once by one of the consumers and not by each of the consumers)
Core producer code used:
static void Produce(int id, ProducerConfig config, double sendInterval, Options o) {
Console.WriteLine($"Starting thread: {id}, sleepInterval: {sendInterval}, batchSize: {o.BatchSize}");
string msg = new string('A', o.MsgSize);
using (var producer = new ProducerBuilder<Null, string>(config).Build()) {
while (true) {
for (int i = 0; i < o.BatchSize; i++) {
var t = producer.ProduceAsync(o.Topic, new Message<Null, string> { Value = msg });
Interlocked.Increment(ref messagesSent);
t.ContinueWith(task => {
if (task.IsFaulted) {
Console.WriteLine($"{t.Exception}");
} else {
Interlocked.Increment(ref messagesDelivered);
}
});
}
Thread.Sleep((int)sendInterval);
}
}
}
Core consumer code used:
...
for (int i = 0; i < o.ThreadCount; i++) {
int id = RandomNumberGenerator.GetInt32(10000);
ConsumerConfig threadConfig = new ConsumerConfig(config);
threadConfig.GroupInstanceId = id.ToString();
var consumer = new ConsumerBuilder<Ignore, string>(threadConfig).Build();
consumers.Add(consumer);
Thread t = new Thread(() => { Consume(id, consumer, token, o); });
t.Start();
threads.Add(t);
}
...
static void Consume(int id, Confluent.Kafka.IConsumer<Ignore, string> consumer, CancellationToken token, Options o) {
Console.WriteLine($"Starting thread: {id}");
try {
consumer.Subscribe(o.Topic);
while (!token.IsCancellationRequested) {
var consumeResult = consumer.Consume(token);
Interlocked.Increment(ref messagesReceived);
}
} catch (OperationCanceledException) {
Console.WriteLine($"Thread cancelled: {id}");
} finally {
consumer.Close();
}
Console.WriteLine($"Ending thread: {id}");
}
Kafka service definition:
kafka:
image: 'bitnami/kafka:3.1.1'
container_name: kafka
ports:
- '9092:9092'
- '6666:6666'
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
- KAFKA_CFG_LISTENERS=PLAINTEXT://:29092,PLAINTEXT_HOST://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
- KAFKA_HEAP_OPTS=-Xmx${MEM_MED_JAVA} -Xms${MEM_MED_JAVA}
- JMX_PORT=6666
- KAFKA_CFG_BROKER_ID=1
depends_on:
- zookeeper
deploy:
resources:
limits:
cpus: ${CPU_LARGE}
memory: ${MEM_MED}
memswap_limit: ${MEM_MED}
Related
I am trying to load test a Kafka instance on one of our servers.
Here is the code that does it using NBomber:
public static void Run()
{
var testScenario = NBomber.CSharp.Step.Create("testScenario",
async context =>
{
try
{
// The testData is a string variable that reads contents from a text file in the Init method.
var kafkaObject = new KafkaObject { Topic = TestTopic, Message =testData };
SampleKafkaFlow sampleKafkaFlow = new SampleKafkaFlow();
var response = await sampleKafkaFlow.SendMessageToKafka(kafkaObject);
return Response.Ok();
}
catch (Exception ex)
{
return Response.Fail(ex.Message);
}
});
var scenario = ScenarioBuilder.CreateScenario("scenario", testScenario)
.WithoutWarmUp()
.WithInit(Init)
.WithLoadSimulations(new[]
{
Simulation.InjectPerSec(rate: 100, during: TimeSpan.FromMinutes(3))
});
NBomber.CSharp.NBomberRunner
.RegisterScenarios(scenario)
.WithReportFileName($"testScenario-Report-{DateTime.UtcNow.ToString("yyyy-dd-M--HH-mm-ss")}")
.WithReportFolder("test_reports")
.WithReportFormats(ReportFormat.Html)
.Run();
}
My laptop configuration:
Core i5 10th Gen with 16 Gb RAM, running Windows 10.
At the time of running the load test only VS 2022 was running.
Now I assumed that at 100 RPS it would generate a total of 18k requests for 3 minutes of execution time. The report says different - while it did run for a total of 3 minutes there were only 2057 total requests!
What am I missing here?
How do I get to doing a load test with a higher RPS?
Thanks in advance.
Probably there is a issue with NBomber itself https://github.com/PragmaticFlow/NBomber/issues/488
Also check nbomber log – it could contains a lot of
Error: step unhandled exception: One or more errors occurred. (Too many open files in system
or another errors that indicate that OS limits your load test
I have a .NET Core 3.1 console application that is running some background cron jobs. There are different jobs doing different things and they will start every x minute, do its thing and then stop. Most of the time they run fine, but lately these jobs have been starting to hang. They just never finish, the process is stuck.
I don't really know how to debug this or how to figure out what is causing it to hang. What I've done is:
Create a DMP file from the task manager
Load this using dotnet dump analyze myfile.DMP
Run dumpasync -stacks
This gives a list of all the stacks. I've created a few DMP files from different workers (doing different things), but they all have in common that there is one task on Npgsql.NpgsqlReadBuffer.
This is one example:
000001ed01aa18c8 00007ffca0c9e3d8 128 1 Npgsql.NpgsqlReadBuffer+<>c__DisplayClass34_0+<<Ensure>g__EnsureLong|0>d
Async "stack":
.000001ed01aa1988 (1) Npgsql.NpgsqlConnector+<>c__DisplayClass160_0+<<DoReadMessage>g__ReadMessageLong|0>d
..000001ed01aa1a40 (5) Npgsql.NpgsqlDataReader+<NextResult>d__44
...000001ed01aa1ae0 (0) Npgsql.NpgsqlCommand+<ExecuteReaderAsync>d__102
....000001ed01aa1b90 (0) Npgsql.NpgsqlCommand+<ExecuteDbDataReaderAsync>d__97
.....000001ed01aa1c10 (1) Dapper.SqlMapper+<QueryAsync>d__33`1[[System.__Canon, System.Private.CoreLib]]
......000001ed021ac5d8 (3) Acme.Common.Data.Dapper.Repositories.AccountItems.GetDapperAccountItemsHandlerSql+<GetAccountItemsAsync>d__3
.......000001ed021ac638 (0) Acme.Common.Data.Dapper.Repositories.ItemRepository`1+<GetAccountItemsHigherThanIdAsync>d__9[[Acme.Core.Db.Dapper.DapperReaderConnection, Acme.Core.Db.Dapper]]
........000001ed021ac698 (1) Acme.Common.Services.EmailReport.ReportDataService+<MakeInstantAlertDto>d__20
.........000001ed00badd90 (3) Acme.Common.Services.EmailReport.ReportDataService+<GetReportDtoAsync>d__19
..........000001ed0105f968 (2) Acme.Common.Services.EmailReport.InstantAlertReportService+<SendInstantAlertReportAsync>d__6
...........000001ed0105f9c8 (0) Acme.Common.Services.EmailReport.EmailReportWorkerService+<SendInstantAlertReportsAsync>d__10
............000001ed01b902d0 System.Threading.Tasks.TaskFactory+CompleteOnInvokePromise
I don't know if this means that npgsql is the cause of the hang, but it seems to be what is common between all of them.
The connection is created like this:
public async Task<IEnumerable<MyDto>> GetData()
{
using (var dbConnection = await _dapperConnection.OpenAsync())
{
var sql = "SELECT * FROM ....";
var result = await dbConnection.QueryAsync<MyDto>(sql);
return result;
}
}
private async Task<NpgsqlConnection> OpenAsync(CancellationToken cancellationToken = default)
{
var connection = new NpgsqlConnection(_connectionString);
await connection.OpenAsync(cancellationToken);
return connection;
}
The connection string looks like this:
User ID=<userid>;Password=<password>;Host=<host>;Port=5432;Database=<databasename>;Pooling=true;Maximum Pool Size=200;Keepalive=30;
How can I debug this further? What would help?
Further technical details
Npgsql version: 4.1.3
PostgreSQL version: 9.6
Operating system: Windows
I'm facing problem with kestrel server's performance. I have following scenario :
TestClient(JMeter) -> DemoAPI-1(Kestrel) -> DemoAPI-2(IIS)
I'm trying to create a sample application that could get the file content as and when requested.
TestClient(100 Threads) requests to DemoAPI-1 which in turn request to DemoAPI-2. DemoAPI-2 reads a fixed XML file(1 MB max) and returns it's content as a response(In production DemoAPI-2 is not going to be exposed to outside world).
When I tested direct access from TestClient -> DemoAPI-2 I got expected result(good) which is following :
Average : 368ms
Minimum : 40ms
Maximum : 1056ms
Throughput : 40.1/sec
But when I tried to access it through DemoAPI-1 I got following result :
Average : 48232ms
Minimum : 21095ms
Maximum : 49377ms
Throughput : 2.0/sec
As you can see there is a huge difference.I'm not getting even the 10% throughput of DemoAPI-2. I was told has kestrel is more efficient and fast compared to traditional IIS. Also because there is no problem in direct access, I think we can eliminate the possible of problem on DemoAPI-2.
※Code of DemoAPI-1 :
string base64Encoded = null;
var request = new HttpRequestMessage(HttpMethod.Get, url);
var response = await this.httpClient.SendAsync(request, HttpCompletionOption.ResponseContentRead).ConfigureAwait(false);
if (response.StatusCode.Equals(HttpStatusCode.OK))
{
var content = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
base64Encoded = Convert.ToBase64String(content);
}
return base64Encoded;
※Code of DemoAPI-2 :
[HttpGet("Demo2")]
public async Task<IActionResult> Demo2Async(int wait)
{
try
{
if (wait > 0)
{
await Task.Delay(wait);
}
var path = Path.Combine(Directory.GetCurrentDirectory(), "test.xml");
var file = System.IO.File.ReadAllText(path);
return Content(file);
}
catch (System.Exception ex)
{
return StatusCode(500, ex.Message);
}
}
Some additional information :
Both APIs are async.
Both APIs are hosted on different EC2 instances(C5.xlarge Windows Server 2016).
DemoAPI-1(kestrel) is a self-contained API(without reverse proxy)
TestClient(jMeter) is set to 100 thread for this testing.
No other configuration is done for kestrel server as of now.
There are no action filter, middleware or logging that could effect the performance as of now.
Communication is done using SSL on 5001 port.
Wait parameter for DemoAPI2 is set to 0 as of now.
The CPU usage of DEMOAPI-1 is not over 40%.
The problem was due to HttpClient's port exhaustion issue.
I was able to solve this problem by using IHttpClientFactory.
Following article might help someone who faces similar problem.
https://www.stevejgordon.co.uk/httpclient-creation-and-disposal-internals-should-i-dispose-of-httpclient
DEMOAPI-1 performs a non-asynchronous read of the streams:
var bytes = stream.Read(read, 0, DataChunkSize);
while (bytes > 0)
{
buffer += System.Text.Encoding.UTF8.GetString(read, 0, bytes);
// Replace with ReadAsync
bytes = stream.Read(read, 0, DataChunkSize);
}
That can be an issue with throughput on a lot of requests.
Also, I'm not fully aware of why are you not testing the same code with IIS and Kestrel, I would assume you need to make only environmental changes and not the code.
I'm trying to figure out why my webservice is so slow and find ways to get it to respond faster. Current average response time without custom processing involved (i.e. apicontroller action returning a very simple object) is about 75ms.
The setup
Machine:
32GB RAM, SSD disk, 4 x 2.7Ghz CPU's, 8 logical processors, x64 Windows 10
Software:
1 asp.net mvc website running .net 4.0 on IISEXPRESS (System.Web.Mvc v5.2.7.0)
1 asp.net web api website running .net 4.0 on IISEXPRESS (System.Net.Http v4.2.0.0)
1 RabbitMQ messagebus
Asp.net Web API Code (Api Controller Action)
[Route("Send")]
[HttpPost]
[AllowAnonymous)
public PrimitiveTypeWrapper<long> Send(WebsiteNotificationMessageDTO notification)
{
_messageBus.Publish<IWebsiteNotificationCreated>(new { Notification = notification });
return new PrimitiveTypeWrapper<long>(1);
}
The body of this method takes 2ms. Stackify tells me there's a lot of overhead on the AuthenticationFilterResult.ExecuteAsync method but since it's an asp.net thing I don't think it can be optimized much.
Asp.net MVC Code (MVC Controller Action)
The RestClient implementation is shown below. The HttpClientFactory returns a new HttpClient instance with the necessary headers and basepath.
public async Task<long> Send(WebsiteNotificationMessageDTO notification)
{
var result = await _httpClientFactory.Default.PostAndReturnAsync<WebsiteNotificationMessageDTO, PrimitiveTypeWrapper<long>>("/api/WebsiteNotification/Send", notification);
if (result.Succeeded)
return result.Data.Value;
return 0;
}
Executing 100 requests as fast as possible on the backend rest service:
[HttpPost]
public async Task SendHundredNotificationsToMqtt()
{
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100; i++)
{
await _notificationsRestClient.Send(new WebsiteNotificationMessageDTO()
{
Severity = WebsiteNotificationSeverity.Informational,
Message = "Test notification " + i,
Title = "Test notification " + i,
UserId = 1
});
}
sw.Stop();
Debug.WriteLine("100 messages sent, took {0} ms", sw.ElapsedMilliseconds);
}
This takes on average 7.5 seconds.
Things I've tried
Checked the number of available threads on both the REST service and the MVC website:
int workers;
int completions;
System.Threading.ThreadPool.GetMaxThreads(out workers, out completions);
which returned for both:
Workers: 8191
Completions: 1000
Removed all RabbitMQ messagebus connectivity to ensure it's not the culprit. I've also removed the messagebus publish method from the rest method _messageBus.Publish<IWebsiteNotificationCreated>(new { Notification = notification }); So all it does is return 1 inside a wrapping object.
The backend rest is using identity framework with bearer token authentication and to eliminate most of it I've also tried marking the controller action on the rest service as AllowAnonymous.
Ran the project in Release mode: No change
Ran the sample 100 requests twice to exclude service initialization cost: No change
After all these attempts, the problem remains, it will still take about +- 75ms per request. Is this as low as it goes?
Here's a stackify log for the backend with the above changes applied.
The web service remains slow, is this as fast as it can get without an expensive hardware upgrade or is there something else I can look into to figure out what's making my web service this slow?
In my Play! 2.1 Rest API App I have installed New Relic.
All of my controller Actions inherit from a method that adds a timeout to the future of the response. If any such method takes longer than 20 seconds, the request is terminated and the result is a 5XX Error.
The code is basically this:
val timeout = 20
action(request).orTimeout(
name + " backend timed-out after "+timeout+" seconds", timeout * 1000).map {
resultOrTimeout => { //... process response or timeout with fold
The problem I'm having is that when analyzing the data in new relic, Im getting that average response times of 20 seconds always.
When looking at the trace, I can see that new relic interprets the timeout function as the container of the response.
Slowest components Count Duration %
Async Wait 7 20,000 ms 100%
Action$$anonfun$apply$1.apply() 2 2 ms 0%
PlayDefaultUpstreamHandler$$an....apply() 1 1 ms 0%
PlayDefaultUpstream....$$anonfun$24.apply() 1 1 ms 0%
SmaugController$class.akkify() 1 0 ms 0%
PlayDefaultUpstreamHandler.handleAction$1() 1 0 ms 0%
Total 20,000 ms 100%
Is there any way I can prevent new-relic from considering that timeout?
Thanks!
EDIT: I expanded the transaction to get more information:
Duration (ms) Duration (%) Segment Drilldown Timestamp
20,000 100.00% HttpRequestDecoder.unfoldAndFireMessageReceived()
20,000 100.00% Async Wait
Stack trace
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1146)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615)
java.lang.Thread.run (Thread.java:679)
107 0.53% SmaugController$class.akkify()
As you can see, the real work is being done in the akkify method, which takes 107 ms, all the rest is being consumed by the Async Wait call
Unfortunately, it is currently not possible to ignore that particular timeout in New Relic.
However, the 3.4.1 release of the New Relic Java Agent supports the handle-timeouts sample code documented in Play 2.2.1: http://www.playframework.com/documentation/2.2.1/ScalaAsync
You can download it here: https://download.newrelic.com/newrelic/java-agent/newrelic-agent/3.4.1/
import play.api.libs.concurrent.Execution.Implicits.defaultContext
import scala.concurrent.duration._
def index = Action.async {
val futureInt = scala.concurrent.Future { intensiveComputation() }
val timeoutFuture = play.api.libs.concurrent.Promise.timeout("Oops", 1.second)
Future.firstCompletedOf(Seq(futureInt, timeoutFuture)).map {
case i: Int => Ok("Got result: " + i)
case t: String => InternalServerError(t)
}
}