ServiceBus in Microsoft Orleans, OrleansPrepareFailedException exception - .net-core

I'm using Microsoft Orleans for .net Core and I'm trying to receive ServiceBus messages and process them as fast as I can.
With parameter MaxConcurrentCalls set to 2 everything works fine. But with set 10 or 30 it throws an exception:
OrleansPrepareFailedException, Transaction 50038 aborted because Prepare phase did not succeed
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MichalBialecki.com.OrleansCore.AccountTransfer.Client.Program.<>c__DisplayClass4_0.<b__0>d.MoveNext()
code looks like this:
subscriptionClient.RegisterMessageHandler(
async (message, token) =>
{
var messageJson = Encoding.UTF8.GetString(message.Body);
var updateMessage = JsonConvert.DeserializeObject<AccountTransferMessage>(messageJson);
await client.GetGrain<IAccountGrain>(updateMessage.From).Withdraw(updateMessage.Amount);
await client.GetGrain<IAccountGrain>(updateMessage.To).Deposit(updateMessage.Amount);
await subscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
},
new MessageHandlerOptions(async args => Console.WriteLine(args.Exception + ", stack trace: " + args.Exception.StackTrace))
{ MaxConcurrentCalls = 30, AutoComplete = false });
My scenario is very simple. It handles account transfer messages and after updating account (Grain) balance, it sends message to a different ServiceBus topic. Currently on my local machine it can handel around 1500 messages per minute, but it feels kinda slow.

The problem was mishandling the state in a grain class. I used transactional state and persistent state at the same time, where I should have used only one. I managed to get my code running correctly for Orleans version 2.0 and .Net Core application.
Here is my code: https://github.com/mikuam/orleans-core-example
And here is my blog post about adding persistent storage to Microsoft Orleans in .Net Core: http://www.michalbialecki.com/2018/04/03/add-cosmosdb-persistent-storage-to-microsoft-orleans-in-net-core/

Related

.Net core gRPC unmanaged memory usage

I have a simple service that listens to RabbitMQ, calls some gRPC services and does other stuff like DB updates etc. I started noticing Kubernetes pods failing with OOM exception so I took out all the logic from Consume method and controller constructor and started adding back parameters one by one. Everything is normal until I pass my gRPC client to constructor- then unmanaged memory starts going up with each new message and never goes down (the Consume method is still empty). Any ideas what I'm doing wrong?
My client is added in Program.cs like this:
services.AddGrpcClient<Notification.Api.Notification.NotificationClient>(o => { o.Address = new Uri(sso.GrpcOptions.ApiBaseAddress); })
.ConfigureChannel(c =>
{
c.LoggerFactory = LoggerFactory.Create(logging =>
{
logging.AddConsole();
logging.SetMinimumLevel(LogLevel.Warning);
});
c.HttpHandler = new SubdirectoryHandler(new HttpClientHandler()
{
}, sso.GrpcOptions.NotifSubdirectory);
});
Eventually I added gRPC clients with services.AddSingleton() instead of using services.AddGrpcClient() and it fixed the increasing memory consumption.

.NET Core 3.1 Console Application hangs

I have a .NET Core 3.1 console application that is running some background cron jobs. There are different jobs doing different things and they will start every x minute, do its thing and then stop. Most of the time they run fine, but lately these jobs have been starting to hang. They just never finish, the process is stuck.
I don't really know how to debug this or how to figure out what is causing it to hang. What I've done is:
Create a DMP file from the task manager
Load this using dotnet dump analyze myfile.DMP
Run dumpasync -stacks
This gives a list of all the stacks. I've created a few DMP files from different workers (doing different things), but they all have in common that there is one task on Npgsql.NpgsqlReadBuffer.
This is one example:
000001ed01aa18c8 00007ffca0c9e3d8 128 1 Npgsql.NpgsqlReadBuffer+<>c__DisplayClass34_0+<<Ensure>g__EnsureLong|0>d
Async "stack":
.000001ed01aa1988 (1) Npgsql.NpgsqlConnector+<>c__DisplayClass160_0+<<DoReadMessage>g__ReadMessageLong|0>d
..000001ed01aa1a40 (5) Npgsql.NpgsqlDataReader+<NextResult>d__44
...000001ed01aa1ae0 (0) Npgsql.NpgsqlCommand+<ExecuteReaderAsync>d__102
....000001ed01aa1b90 (0) Npgsql.NpgsqlCommand+<ExecuteDbDataReaderAsync>d__97
.....000001ed01aa1c10 (1) Dapper.SqlMapper+<QueryAsync>d__33`1[[System.__Canon, System.Private.CoreLib]]
......000001ed021ac5d8 (3) Acme.Common.Data.Dapper.Repositories.AccountItems.GetDapperAccountItemsHandlerSql+<GetAccountItemsAsync>d__3
.......000001ed021ac638 (0) Acme.Common.Data.Dapper.Repositories.ItemRepository`1+<GetAccountItemsHigherThanIdAsync>d__9[[Acme.Core.Db.Dapper.DapperReaderConnection, Acme.Core.Db.Dapper]]
........000001ed021ac698 (1) Acme.Common.Services.EmailReport.ReportDataService+<MakeInstantAlertDto>d__20
.........000001ed00badd90 (3) Acme.Common.Services.EmailReport.ReportDataService+<GetReportDtoAsync>d__19
..........000001ed0105f968 (2) Acme.Common.Services.EmailReport.InstantAlertReportService+<SendInstantAlertReportAsync>d__6
...........000001ed0105f9c8 (0) Acme.Common.Services.EmailReport.EmailReportWorkerService+<SendInstantAlertReportsAsync>d__10
............000001ed01b902d0 System.Threading.Tasks.TaskFactory+CompleteOnInvokePromise
I don't know if this means that npgsql is the cause of the hang, but it seems to be what is common between all of them.
The connection is created like this:
public async Task<IEnumerable<MyDto>> GetData()
{
using (var dbConnection = await _dapperConnection.OpenAsync())
{
var sql = "SELECT * FROM ....";
var result = await dbConnection.QueryAsync<MyDto>(sql);
return result;
}
}
private async Task<NpgsqlConnection> OpenAsync(CancellationToken cancellationToken = default)
{
var connection = new NpgsqlConnection(_connectionString);
await connection.OpenAsync(cancellationToken);
return connection;
}
The connection string looks like this:
User ID=<userid>;Password=<password>;Host=<host>;Port=5432;Database=<databasename>;Pooling=true;Maximum Pool Size=200;Keepalive=30;
How can I debug this further? What would help?
Further technical details
Npgsql version: 4.1.3
PostgreSQL version: 9.6
Operating system: Windows

Realm doesn’t work with xUnite and .net core

I’m having issues running realm with xUnite and Net core. Here is a very simple test that I want to run
public class UnitTest1
{
[Scenario]
public void Test1()
{
var realm = Realm.GetInstance(new InMemoryConfiguration("Test123"));
realm.Write(() =>
{
realm.Add(new Product());
});
var test = realm.All<Product>().First();
realm.Write(() => realm.RemoveAll());
}
}
I get different exceptions on different machines (Windows & Mac) on line where I try to create a Realm instace with InMemoryConfiguration.
On Mac I get the following exception
libc++abi.dylib: terminating with uncaught exception of type realm::IncorrectThreadException: Realm accessed from incorrect thread.
On Windows I get the following exception when running
ERROR Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. at
System.Net.Sockets.NetworkStream.Read(Span1 destination) at
System.Net.Sockets.NetworkStream.ReadByte() at
System.IO.BinaryReader.ReadByte() at
System.IO.BinaryReader.Read7BitEncodedInt() at
System.IO.BinaryReader.ReadString() at
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable() at
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action1 errorHandler, CancellationToken cancellationToken) Source: System.Net.Sockets HResult: -2146232800 Inner Exception: An existing connection was forcibly closed by the remote host HResult: -2147467259
I’m using Realm 3.3.0 and xUnit 2.4.1
I’ve tried downgrading to Realm 2.2.0, and it didn’t work either.
The solution to this problem was found in this Github post
The piece of code from that helped me to solve the issue
Realm GetInstanceWithoutCapturingContext(RealmConfiguration config)
{
var context = SynchronizationContext.Current;
SynchronizationContext.SetSynchronizationContext(null);
Realm realm = null;
try
{
realm = Realm.GetInstance(config);
}
finally
{
SynchronizationContext.SetSynchronizationContext(context);
}
return realm;
}
Though it took a while for me to apply this to my solution.
First and foremost, instead of just setting the context to null I am using Nito.AsyncEx.AsyncContext. Because otherwise automatic changes will not be propagated through threads, as realm needs a non-null SynchronizationContext for that feature to work. So, in my case the method looks something like this
public class MockRealmFactory : IRealmFactory
{
private readonly SynchronizationContext _synchronizationContext;
private readonly string _defaultDatabaseId;
public MockRealmFactory()
{
_synchronizationContext = new AsyncContext().SynchronizationContext;
_defaultDatabaseId = Guid.NewGuid().ToString();
}
public Realm GetRealmWithPath(string realmDbPath)
{
var context = SynchronizationContext.Current;
SynchronizationContext.SetSynchronizationContext(_synchronizationContext);
Realm realm;
try
{
realm = Realm.GetInstance(new InMemoryConfiguration(realmDbPath));
}
finally
{
SynchronizationContext.SetSynchronizationContext(context);
}
return realm;
}
}
Further, this fixed a lot of failing unit tests. But I was still receiving that same exception - Realm accessed from incorrect thread. And I had no clue why, cause everything was set correctly. Then I found that the tests that were failing were related to methods where I was using async realm api, in particular realm.WriteAsync. After some more digging I found the following lines in the realm documentation.
It is not a problem if you have set SynchronisationContext.Current but
it will cause WriteAsync to dispatch again on the thread pool, which
may create another worker thread. So, if you are using Current in your
threads, consider calling just Write instead of WriteAsync.
In my code there was no direct need of using the async API. I removed and replaced with sync Write and all the tests became green again! I guess if I find myself in a situation that I do need to use the async API because of some kind of bulk insertions, I'd either mock that specific API, or replace with my own background thread using Task.Run instead of using Realm's version.

Diagnosing performance issue with asp.net web api

I'm trying to figure out why my webservice is so slow and find ways to get it to respond faster. Current average response time without custom processing involved (i.e. apicontroller action returning a very simple object) is about 75ms.
The setup
Machine:
32GB RAM, SSD disk, 4 x 2.7Ghz CPU's, 8 logical processors, x64 Windows 10
Software:
1 asp.net mvc website running .net 4.0 on IISEXPRESS (System.Web.Mvc v5.2.7.0)
1 asp.net web api website running .net 4.0 on IISEXPRESS (System.Net.Http v4.2.0.0)
1 RabbitMQ messagebus
Asp.net Web API Code (Api Controller Action)
[Route("Send")]
[HttpPost]
[AllowAnonymous)
public PrimitiveTypeWrapper<long> Send(WebsiteNotificationMessageDTO notification)
{
_messageBus.Publish<IWebsiteNotificationCreated>(new { Notification = notification });
return new PrimitiveTypeWrapper<long>(1);
}
The body of this method takes 2ms. Stackify tells me there's a lot of overhead on the AuthenticationFilterResult.ExecuteAsync method but since it's an asp.net thing I don't think it can be optimized much.
Asp.net MVC Code (MVC Controller Action)
The RestClient implementation is shown below. The HttpClientFactory returns a new HttpClient instance with the necessary headers and basepath.
public async Task<long> Send(WebsiteNotificationMessageDTO notification)
{
var result = await _httpClientFactory.Default.PostAndReturnAsync<WebsiteNotificationMessageDTO, PrimitiveTypeWrapper<long>>("/api/WebsiteNotification/Send", notification);
if (result.Succeeded)
return result.Data.Value;
return 0;
}
Executing 100 requests as fast as possible on the backend rest service:
[HttpPost]
public async Task SendHundredNotificationsToMqtt()
{
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100; i++)
{
await _notificationsRestClient.Send(new WebsiteNotificationMessageDTO()
{
Severity = WebsiteNotificationSeverity.Informational,
Message = "Test notification " + i,
Title = "Test notification " + i,
UserId = 1
});
}
sw.Stop();
Debug.WriteLine("100 messages sent, took {0} ms", sw.ElapsedMilliseconds);
}
This takes on average 7.5 seconds.
Things I've tried
Checked the number of available threads on both the REST service and the MVC website:
int workers;
int completions;
System.Threading.ThreadPool.GetMaxThreads(out workers, out completions);
which returned for both:
Workers: 8191
Completions: 1000
Removed all RabbitMQ messagebus connectivity to ensure it's not the culprit. I've also removed the messagebus publish method from the rest method _messageBus.Publish<IWebsiteNotificationCreated>(new { Notification = notification }); So all it does is return 1 inside a wrapping object.
The backend rest is using identity framework with bearer token authentication and to eliminate most of it I've also tried marking the controller action on the rest service as AllowAnonymous.
Ran the project in Release mode: No change
Ran the sample 100 requests twice to exclude service initialization cost: No change
After all these attempts, the problem remains, it will still take about +- 75ms per request. Is this as low as it goes?
Here's a stackify log for the backend with the above changes applied.
The web service remains slow, is this as fast as it can get without an expensive hardware upgrade or is there something else I can look into to figure out what's making my web service this slow?

LoadBalancingChannel exception on DocumentClient.Dispose()

We have hit an issue where after using CosmosDb, an exception occurs if we try to dispose of the DocumentClient shortly after. Waiting a few seconds before disposing causes no exceptions. We have confirmed that we are using await with every asynchronous call.
Psuedo-code:
using(DocumentClient documentClient = new DocumentClient(...params)) {
IOrderedQueryable<T> query = this.documentClient.CreateDocumentQuery<T>(...params);
IList<T> documents;
using (IDocumentQuery<T> documentQuery = query.AsDocumentQuery()) {
documents = (await documentQuery.ExecuteNextAsync<T>()).ToList();
}
// Processing...
}
The exception states:
LoadBalancingChannel rntbd://[ip].documents.azure.com:[port]/ in use
The API that makes the call successfully returns before DocumentClient.Dispose is called (all of the documents are correctly returned).
Has anyone seen this exception before? A search revealed no hits.
It can happen if the DocumentClient is disposed while pending requests. this was addressed in SDK version 2.2.2, please upgrade to latest SDK version.

Resources