How to safely close a connection in Trio? - asynchronous

I have the application logic for a connection all wrapped in a big try/except/finally block:
async def serve(self, stream: trio.SocketStream):
try:
async with trio.open_nursery() as nursery:
pass # code goes here to do some reading and writing
except Exception as e:
print("Got exception:", e)
except trio.Cancelled:
print("Cancelled")
raise
# some other handlers, mostly just logging
finally:
# ... some other stuff ...
# ... then shut the connection:
with trio.move_on_after(1):
await stream.aclose()
I'm not clear whether I should try to close the connection in the finally clause. If the exception is due to an application error (invalid message from the connection, or coroutine cancelled) this seems like the right thing to do, but if the exception is that the connection was closed from the other end then this seems counter productive - I expect the old exception would get masked by a new one. Any thoughts? I could maybe wrap it in its own try/except and just ignore any exceptions.

On the python-trio/general Gitter chat, Joshua Oreman #oremanj kindly offered this answer:
Trio distinguishes between local and remote close. If a connection got closed by the remote side, it's still fine to aclose() it. (Precisely for this reason)
(If it got aclose()'d locally, it's also fine to aclose() it again. It's a noop (except for executing a checkpoint) in either case.)
You get ClosedResourceError if you use the connection after you closed it (this is a programming error that you should fix), as opposed to BrokenResourceError if you use the connection after some problem occurred on the other side (this should be expected sometimes due to the vagaries of networks).

Related

How to get visibility into completion queue on C++ gRPC server

Note: Help with the immediate problem would be great, but mostly I'm looking for advice on troubleshooting gRPC timing issues in general (this isn't my first such issue).
I am adding a new server streaming service to a C++ module which has an existing server streaming service, and the two appear to be conflicting. Specifically, the completion queue Next() call on the server is crashing intermittently after the C# client calls Cancel() on the cancellation token for one of the services. This doesn't happen if I run each service independently.
On the client, I get this at the response stream MoveNext() call:
System.InvalidOperationException
HResult=0x80131509
Message=Shutdown has already been called
Source=Grpc.Core
StackTrace:
at Grpc.Core.Internal.CompletionQueueSafeHandle.BeginOp()
at Grpc.Core.Internal.CallSafeHandle.StartReceiveMessage(IReceivedMessageCallback callback)
at Grpc.Core.Internal.AsyncCallBase`2.ReadMessageInternalAsync()
at Grpc.Core.Internal.ClientResponseStream`2.<MoveNext>d__5.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at MyModule.Connection.<DoSubscriptionReceives>d__7.MoveNext() in C:\snip\Connection.cs:line 67
On the server, I get this at the completion queue next() call:
Exception thrown: read access violation.
core_cq_tag->**** was 0xDDDDDDDD.
The stack trace:
MyModule.exe!grpc_impl::CompletionQueue::AsyncNextInternal(void * * tag, bool * ok, gpr_timespec deadline) Line 59 C++
> MyModule.exe!grpc_impl::CompletionQueue::Next(void * * tag, bool * ok) Line 176 C++
...snip...
It appears something is being added to the queue after shutdown. The difficulty is I have little visibility into what is being added into the queue and in what order.
I'm trying to write a server-side interceptor to log all requests & responses, but there seems to be no documentation. So far, poking through the API hasn't gotten me very far. Is there any documentation available on wiring up an interceptor in C++? Or, are there other approaches for troubleshooting timing conflicts between services?
Windows 11, Grpc.Core 1.27
What I've tried:
I first played with the GRPC_TRACE & GRPC_VERBOSITY environment variables. I was able to get some unhelpful output from the client, but nothing from the server. Of course, there's been lots of debugging, stripping the client & server down to barebones, disabling keep alives, ensuring we aren't using deadlines, having the services share a cancellation token, etc.
Update: I have found that the crash only happens when the client is run from an NUnit test. In that environment, the completion queue is getting more hits on Next(), but I'm still trying to figure out where they are coming from.
Is 1.27 the version you are using? That seems pretty old.. There might have been fixes since then.
For using the C++ server interception API, I think you would find this very useful - https://github.com/grpc/grpc/blob/0f2a0f5fc9b9e9b9c98d227d16575d106f1e8d43/test/cpp/end2end/server_interceptors_end2end_test.cc#L48
One suggestion I have is to run the code another sanitizers https://github.com/google/sanitizers to make sure that we don't have a heap-use-after-free type bug.
I would also check for API misuse issues. (If you had posted the code, I could've given a look to see if anything seems weird..)

Blazor WebSocket closed with status code 1006

I have made a Blazor app, which is working well locally. When I put it on the server, quite often (when app uses DB context) I get this error :
Error: Connection disconnected with error 'Error: WebSocket closed with status code: 1006 ()
The user have to refresh the page, which is really annoying. You can't use app working in this way.
I have found a lot of discussions on this error, lot of plans...and almost everything is older than one year. I would expect the solution already, but haven't found anything.
Anyone knows, why is this happening and how to figure it out in the Blazor app? At least to catch this error and wait until the connection is back, so the page is not getting faded?
So far I was able to do only automatic reloading of page by javascript, when I get this error. But anyway, I can't use this solution in production, because the page is down for a second and it doesn't look good. I need to catch it before and keep the page active.
Thank you.
FYI , Have you check somewhere on the server-side like the logic that use data from DB context or the security/config in the production between IIS server and DB? If you are sure that it's come from the DB context then have you validate by test other possibility like make the test method that have long delay time/or mockup method that return the data to check whether the error still occur?
I once have a really stupid code in an object. but the code is build with no error. But it clash on runtime with no clue relate to the problem.
private string _oh;
public string oh
{
get { return _oh; }
set { oh= value; // cause infinite loop > should _oh
}
}
The worst part is that the error throwed is the same message as this question, So I quite sure there is the root clause elsewhere.
Error: Connection disconnected with error 'Error: WebSocket closed with status code: 1006 ()

Timing issue a C++/winRT BLE connection attempt?

I am using C++/winRT UWP to discover and connect to Bluetooth Low Energy devices. I am using the advertisment watcher to look for advertisements from devices I can support. This works.
Then I pick one to connect to. The connection procedure is a little weird by my way of thinking but according to the microsoft docs one Calls this FromBluetoothAddressAsync() with the BluetoothAddress and two things happen; one gets the BluetoothLEDevice AND a connection attempt is made. One needs to register a handler for the connection status changed event BUT you can't do that until you get the BluetoothLEDevice.
Is there a timing issue causing the exception? Has the connection already happened BEFORE I get the BluetoothLEDevice object? Below is the code and below that is the log:
void BtleHandler::connectToDevice(BluetoothLEAdvertisementReceivedEventArgs eventArgs)
{
OutputDebugStringA("Connect to device called\n");
// My God this async stuff took me a while to figure out! See https://msdn.microsoft.com/en-us/magazine/mt846728.aspx
IAsyncOperation<Windows::Devices::Bluetooth::BluetoothLEDevice> async = // assuming the address type is how I am to behave ..
BluetoothLEDevice::FromBluetoothAddressAsync(eventArgs.BluetoothAddress(), BluetoothAddressType::Random);
bluetoothLEDevice = async.get();
OutputDebugStringA("BluetoothLEDevice returned\n");
bluetoothLEDevice.ConnectionStatusChanged({ this, &BtleHandler::onConnectionStatusChanged });
// This method not only gives you the device but it also initiates a connection
}
The above code generates the following log:
New advertisment/scanResponse with UUID 00001809-0000-1000-8000-00805F9B34FB
New ad/scanResponse with name Philips ear thermometer and UUID 00001809-0000-1000-8000-00805F9B34FB
Connect to device called
ERROR here--> onecoreuap\drivers\wdm\bluetooth\user\winrt\common\bluetoothutilities.cpp(509)\Windows.Devices.Bluetooth.dll!03BEFDD6: (caller: 03BFB977) ReturnHr(1) tid(144) 80070490 Element not found.
ERROR here--> onecoreuap\drivers\wdm\bluetooth\user\winrt\device\bluetoothledevice.cpp(428)\Windows.Devices.Bluetooth.dll!03BFB9B7: (caller: 03BFAF01) ReturnHr(2) tid(144) 80070490 Element not found.
BluetoothLEDevice returned
Exception thrown at 0x0F5CDF2F (WindowsBluetoothAdapter.dll) in BtleScannerTest.exe: 0xC0000005: Access violation reading location 0x00000000.
It sure looks like there is a timing issue. But if it is, I have no idea how to resolve it. I cannot register for the event if I don't have a BluetoothLEDevice object! I cannot figure out a way to get the BluetoothLEDevice object without invoking a connection.
================================ UPDATE =============================
Changed the methods to IAsyncAction and used co_await as suggested by #IInspectable. No difference. The problem is clearly that the registered handler is out of scope or something is wrong with it. I tried a get_strong() instead of a 'this' in the registration, but the compiler would not accept it (said identifier 'get_strong()' is undefined). However, if I commented out the registration, no exception is thrown but I still get these log messages
onecoreuap\drivers\wdm\bluetooth\user\winrt\common\bluetoothutilities.cpp(509)\Windows.Devices.Bluetooth.dll!0F27FDD6: (caller: 0F28B977) ReturnHr(3) tid(253c) 80070490 Element not found.
onecoreuap\drivers\wdm\bluetooth\user\winrt\device\bluetoothledevice.cpp(428)\Windows.Devices.Bluetooth.dll!0F28B9B7: (caller: 0F28AF01) ReturnHr(4) tid(253c) 80070490 Element not found.
But the program continues to run an I continue to discover and connect. But since I can't get the connection event it is kind of useless at this stage.
I hate my answer. But after asynching and co-routining everything under the sun, the problem is unsolvable by me:
This method
bluetoothLEDevice = co_await BluetoothLEDevice::FromBluetoothAddressAsync(eventArgs.BluetoothAddress(), BluetoothAddressType::Random);
returns NULL. That should not happen and there is not much I can do about it. I read that as a broken BLE API.
A BTLE Central should be able to do as follows
Discover a device if new then:
If user selects connect, connect to
the device
perform service discovery
read/write/enable
characteristics as needed
handle indications/notifications
If at any time the peripheral sends a security request or insufficient authentication error, start pairing
repeat the action that caused the insufficient authentication.
On disconnect, save the paired and bonded state if the device is pairable.
On rediscovery of the device, if unpaired (not a pairable device)
repeat above
If paired and bonded
start encryption
work with the device; no need to re-enable or do service discovery
========================= MORE INFO ===================================
This is what the log shows when the method is called
Connect to device called
onecoreuap\drivers\wdm\bluetooth\user\winrt\common\bluetoothutilities.cpp(509)\Windows.Devices.Bluetooth.dll!0496FDD6: (caller: 0497B977) ReturnHr(1) tid(3b1c) 80070490 Element not found.
onecoreuap\drivers\wdm\bluetooth\user\winrt\device\bluetoothledevice.cpp(428)\Windows.Devices.Bluetooth.dll!0497B9B7: (caller: 0497AF01) ReturnHr(2) tid(3b1c) 80070490 Element not found.
BluetoothLEDevice returned is NULL. Can't register
Since the BluetoothLEDevice is NULL, I do not attempt to register.
================= MORE INFO ===================
I should also add that taking an over-the-air sniff reveals that there is never a connection event. Though the method is supposed to initiate a connection as well as return the BluetoothLEDevice object, it ends up doing neither. My guess is that the method requires more pre-use setup of the system that only the DeviceWatcher does. The AdvertisementWatcher probably does not.
In BLE you always have to wait for every operation to complete.
I am not an expert in C++, but in C# the async connection procedure returns a bool if it was successful.
In C++ the IAsyncOperation does not have a return type, so there is no way to know if the connection procedure was successful or completed.
You will have to await the IAsyncOperation and make sure that you have a BluetoothLEDevice object, before you attach the event handler.
To await an IAsyncOperation there is a question/answer on how to await anIAsyncOperation:
How to wait for an IAsyncAction? How to wait for an IAsyncAction?

Connection pooling - closed connection issue

Problems:
I have a web app that is getting battered. Everything is working but I get about two errors a day I can not explain. It could be the first user hitting the system because the error appears in the error log intermittently it occurred today around 7:30 am in servers time zone (same as the users).
I fixed an old issue by implicitly destroying the connection by way of the using statement as it only marks the connection as recyclable.
Error Log:
Message: 3df8d62a-6b47-479e-b5d2-b23e0db92d92 : System.InvalidOperationException: Invalid operation. The connection is closed.
at System.Data.SqlClient.SqlConnection.GetOpenConnection()
at System.Data.SqlClient.SqlConnection.get_ServerVersion()
at System.Data.Linq.SqlClient.SqlProvider.get_IsServer2005()
at System.Data.Linq.SqlClient.SqlProvider.InitializeProviderMode()
at System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Execute(Expression query)
at System.Data.Linq.DataQuery`1.System.Linq.IQueryProvider.Execute[S](Expression expression)
at System.Linq.Queryable.SingleOrDefault[TSource](IQueryable`1 source)
at xxxxx.Resources.Data.LINQ.DistrictService.GetDistrict(Int32 districtID)
Related Code:
//--------------------------------------------------------------------------------------------
public Domain.Data.District GetDistrict(int districtID)
{
using (_dataContext = new DistrictDataDataContext(_systemService.GetCurrentSystem().WriteOnlyDatabase.ConnectionString))
{
Mapper.CreateMap<District, Domain.Data.District>();
return (from s in _dataContext.Districts
where s.DistrictID == districtID
select Mapper.Map<District, Domain.Data.District>(s)).SingleOrDefault();
}
}
Note:
I think that this is a non-issue because the inherent behavior of the connection pooling might be firing a retry (this could just be information saying "hey we tried what you asked for but the connection was closed, we will now open one up it.") however, If this is the case, I do not get a confirmation of successful retry and I am not even sure would be expected. We have had no complaints and this cant be duplicated in house by our QA staff. I hope maybe someone else has come across this.

Why do I get exception "The execution of the InstancePersistenceCommand named LoadWorkflowByInstanceKey was interrupted by an error"

After doing some refactoring to my WF4 service, I got this exception when calling some of the operations:
The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}LoadWorkflowByInstanceKey was interrupted by an error.
My xamlx file contains a few receive/sendreplytoreceive pairs, as shown below. The exception sometimes happens on receive2, sometimes receive3.
receive1 (no correlation, cancreateinstance=true)
send reply to receive (initializes content correlation on generated ID)
receive2 (correlates on ID, cancreateinstance=false)
send reply to receive
receive 3 (correlates on ID, cancreateinstance=false)
send reply to receive
After doing a lot of debugging and making sure all correlations where set up right, the exception disappeared for new instances of the workflow.
What does the exception mean, and why did it show up and why did it dissappear all of a sudden? Is it a code/xamlx issue or something with the infrastructure (AppFabric/SQL)?
I'm hosting the WF service with IIS/AppFabric, using AppFabric' SQL persistence.
According to this support note this error can be the result of a race condition between the Receive and a Delay activity expiring. Is this possible in your workflow.
I kinda figured mine out... aparently if you point your persistance store in a SQL previous to 2012 you get the error... so all i had to do is put mine persistance store in a SQL 2012...
When I had this problem it turned out to be a mistake in my connection string when instantiating the persistence store object.
SqlWorkflowInstanceStore store = new SqlWorkflowInstanceStore(connStr);
I realise this an old question but fixing the connection string got rid of my error while running store.Execute() so I thought I'd share!

Resources