Whether to use TPL or async /await - asp.net

There is an existing third party Rest API available which would accept one set of input and return the output for the same. (Think of it as Bing's Geo coding service, which would accept address and return location detail)
My need would be is to call this API multiple times (say 500-1000) for a single asp.net request and each call may take close to 500ms to return.
I could think of three approaches on how to do this action. Need your input on which could be best possible approach keeping speed as criteria.
1. Using Http Request in a for loop
Write a simple for loop and for each input call the REST API and add the output to the result. This by far could be the slowest. But there is no overhead of threads or context switching.
2. Using async and await
Use async and await mechanisms to call REST Api. It could be efficient as thread continues to do other activites while waiting for REST call to return. The problem I am facing is that, as per recommendations I should be using await all the way to the top most caller, which is not possible in my case. Not following it may lead to dead locks in asp.net as mentioned here http://msdn.microsoft.com/en-us/magazine/jj991977.aspx
3. Using Task Parallel Library
Using a Parallel.ForEach and use the Synchronuos API to invoke the Server parallely and use ConcurrentDictionary to hold the result. But may result in thread overhead
Also, let me know is there any other better way to handle things. I understand people might suggest to track performance for each approach, but would like to understand how people has solved this problem before

The best solution is to use async and await, but in that case you will have to take it async all the way up the call stack to the controller action.
The for loop keeps it all sequential and synchronous, so it would definitely be the slowest solution. Parallel will block multiple threads per request, which will negatively impact your scalability.
Since the operation is I/O-based (calling a REST API), async is the most natural fit and should provide the best overall system performance of these options.

First, I think it's worth considering some issues that you didn't mention in your question:
500-1000 API calls sounds like quite a lot. Isn't there a way to avoid that? Doesn't the API have some kind of bulk query functionality? Or can't you download their database and query it locally? (The more open organizations like Wikimedia or Stack Exchange often support this, the more closed ones like Microsoft or Google usually don't.)
If those options are not available, then at least consider some kind of caching, if that makes sense for you.
The number of concurrent requests to the same server allowed at the same time in ASP.NET is only 10 by default. If you want to make more concurrent requests, you will need to set ServicePointManager.DefaultConnectionLimit.
Making this many requests could be considered abuse by the service provider and could lead to blocking of your IP. Make sure the provider is okay with this kind of usage.
Now, to your actual question: I think that the best option is to use async-await, even if you can't use it all the way. You can avoid deadlocks either by using ConfigureAwait(false) at every await (which is the correct solution) or by using something like Task.Run(() => /* your async code here */).Wait() to escape the ASP.NET context (which is the simple solution).
Using something like Parallel.ForEach() is not great, because it unnecessarily wastes ThreadPool threads.
If you go with async, you should probably also consider throttling. A simple way to achieve that is by using SemaphoreSlim.

Related

Client callbacks with protobuf-net.Grpc

I'm currently working on replacing an old WCF client/server pairing with gRpc, and decided to use protobuf-net.Grpc as we've used protobuf-net extensively elsewhere in our codebase. I'm running into a bit of trouble with one particular portion however.
Part of the original service is a Subscribe method which uses IClientCallback to effectively send an event to the client. Looking at regular gRpc, it seems like this would be possible (though a bit hacky) using a server streaming method and storing the IServerStreamWriter object on the server, writing to it whenever we wanted to "fire an event".
For the life of me, however, I can't quite figure out how to do something similar in protobuf-net.Grpc with the IAsyncEnumerable return type. The closest I can figure is using Task.Wait in a loop and updating some shared collection when I want to "fire" the event, which the loop would then check for and yield return. This doesn't seem like it'd scale well, however, and there isn't really a great way to definitely unsubscribe when a client is no longer listening to events.
Is there some other/better way to do this?
Channel<T>, which can be tweaked via AsAsyncEnumerable() - which then essentially acts as a queue at the producer side, and a sequence at the consumer.

.NET/C# - What is the best option instead of an ActionBlock<T> (or Channel<T>) for speed?

corefxlab has something called a Channel which is a really nice implementation of an async P-C queue and definitely does what I'm looking for. I'm curious if there's an implementation that ultimately had a similar API to ActionBlock<T>:
Must be able to accept/deny from multiple producers.
Only needs to have one consuming task but would be preferable that it continue processing until empty. Then 'wait' for new items.
A Channel<T> is much faster than an BufferBlock<T> but I'm just curious if given the specific requirements if there was something even faster.
According to a readme by Stephen Toub, Channels might end up being the underlying implementation around some Dataflow blocks. Channels wins for P-C queue async speed.

Intergrating both synchronous and asynchronous libraries

Can synchronous and asynchronous functions be integrated into one call/interface whilst maintaining static typing? If possible, can it remain neutral with inheritance, i.e. not wrapping sync methods in async or vice versa (though this might be the best way).
I've been reading around and see it's generally recommending to keep these separate (http://www.tagwith.com/question_61011_pattern-for-writing-synchronous-and-asynchronous-methods-in-libraries-and-keepin and Maintain both synchronous and asynchronous implementations). However, the reason I want to do this is I'm creating a behaviour tree framework for Dart language and am finding it hard to mix both sync and async 'nodes' together to iterate through. It seems these might need to be kept separate, meaning nodes that would suit a sync approach would have to be async, or the opposite, if they are to be within the same 'tree'.
I'm looking for a solution particularly for Dart lang, although I know this is firmly in the territory of general programming concepts. I'm open to this not being able to be achieved, but worth a shot.
Thank you for reading.
You can of course use sync and async functions together. What you can't do is go back to sync execution after a call of an async function.
Maintaining both sync and async methods is in my opinion mostly a waste of time. Sometimes sync versions are convenient to not to have to invoke an async call for some simple operation but in general Dart async is an integral part of Dart. If you want to use Dart you have to get used to it.
With the new async/await feature you can write code that uses async functions almost the same as when only sync functions are used.

What logic must I cover in Collection.allow and Collection.deny to ensure it's secure?

So just started playing with Meteor and trying to get my head around the security model. It seems there's two ways to modify data.
The Meteor.call way which seems pretty standard - pretty much just a call to the server with its own set of business rules implemented.
Then there is the Collection.allow method which seems much more different to anything I've done before. So it seems that if you put an collection.allow, you're saying that the client can make any write operation to that collection as long as it can get past the validations in its allow function.
That makes me feel uneasy cause it's feels like a lot of freedom and my allow function would need to be pretty long to make sure it's locked down securely enough.
For instance, mongodb has no schema, so you'd have to basically have a rule that defines which fields would be accepted and the format those fields must be in.
Wouldn't you also have to put in the business logic for every type of update that might be made to your system.
So say, I had a SoccerTeam collection. There may be several situations I may need to make a change, like if I'm adding or removing a player, changing the team name, team status has changed etc.
It seems to me that you'd have to put everything into this one massive function. It just sounds like a radical idea, but it seems Meteor.call methods would just be a lot simpler.
Am I thinking about this in the wrong manner (or for the wrong use case?) Does anyone have any example of how they can structure an allow or deny function with a list of what I may need to check in my allow function to make my collection secure?
You are following the same line of reasoning I used in deciding how to handle data mutations when building Edthena. Out of the box, meteor provides you with the tools to make a simple tradeoff:
Do I trust the client and get a more responsive UI (latency compensation)? Or do I require strict control over data validation, but force the client to wait for an update?
I went with the latter, and exclusively used method calls for a few reasons:
I sleep better a night knowing there exists exactly one way to update each of my collections.
I found that some of my updates required side effects that only made sense to execute on the server (e.g. making denormalized updates to other collections).
At present, there isn't a clear benefit to latency compensation for our app. We found the delay for most writes was inconsequential to the user experience.
allow and deny rules are weak tools. They are essentially only good for validating ownership and other simple checks.
At the time when we first released to production (August 2013) this seemed like a radical conclusion. The meteor docs, the API, and the demos highlight the use of client-side writes, so I wasn't entirely sure I had made the right decision. A couple of months later I had my first opportunity to sit down with several of the meteor core devs - this is a summary of their reaction to my design choices:
This seems like a rational approach. Latency compensation is really useful in some contexts like mobile apps, and games, but may not be worth it for all web apps. It also makes for cool demos.
So there you have it. As of this writing, my advice for production apps would be to use client-side updates where you really need the speed, but you shouldn't feel like you are doing something wrong by making heavy use of methods.
As for the future, I'd imagine that post-1.0 we'll start to see things like built-in schema enforcement on both the client and server which will go a long way towards resolving my concerns. I see Collection2 as a significant first step in that direction, but I haven't tried it yet in any meaningful way.
stubs
A logical follow-up question is "Why not use stubs?". I spent some time investigating this but reached the conclusion that method stubbing wasn't useful to our project for the following reasons:
I like to keep my server code on the server. Stubbing requires that I either ship all of my model code to the client or selectively repeat parts of it again. In a large app, I don't see that as practical.
I found the the overhead required to separate out what may or may not run on the client to be a maintenance challenge.
In order for the stub to do anything other than reject a database mutation, you'd need to have an allow rule in place - otherwise you'd end up with a lot of UI flicker (the client allows the write but the server immediately invalidates it). But having an allow rule defeats the whole point, because a user could still write to the db from the console.
The usual allow methods I have are these:
MyCollection.allow({
insert: false
update: false
remove: false
})
And then, I have methods which take care of all insertions. These methods perform the type checks and permission assessment. I have found that to be a much more maintainable method: completely decoupling the data layer from the code which runs on the client.
For instance, mongodb has no schema, so you'd have to basically have a rule that defines which fields would be accepted and the format those fields must be in.
Take a look at Collection2. They support schema checking at run-time before inserting documents into the Collection.

Suggestions for doing async I/O with Task Parallel Library

I have some high performance file transfer code which I wrote in C# using the Async Programming Model (APM) idiom (eg, BeginRead/EndRead). This code reads a file from a local disk and writes it to a socket.
For best performance on modern hardware, it's important to keep more than one outstanding I/O operation in flight whenever possible. Thus, I post several BeginRead operations on the file, then when one completes, I call a BeginSend on the socket, and when that completes I do another BeginRead on the file. The details are a bit more complicated than that but at the high level that's the idea.
I've got the APM-based code working, but it's very hard to follow and probably has subtle concurrency bugs. I'd love to use TPL for this instead. I figured Task.Factory.FromAsync would just about do it, but there's a catch.
All of the I/O samples I've seen (most particularly the StreamExtensions class in the Parallel Extensions Extras) assume one read followed by one write. This won't perform the way I need.
I can't use something simple like Parallel.ForEach or the Extras extension Task.Factory.Iterate because the async I/O tasks don't spend much time on a worker thread, so Parallel just starts up another task, resulting in potentially dozens or hundreds of pending I/O operations; way too much! You can work around that by Waiting on your tasks, but that causes creation of an event handle (a kernel object), and a blocking wait on a task wait handle, which ties up a worker thread. My APM-based implementation avoids both of those things.
I've been playing around with different ways to keep multiple read/write operations in flight, and I've managed to do so using continuations that call a method that creates another task, but it feels awkward, and definitely doesn't feel like idiomatic TPL.
Has anyone else grappled with an issue like this with the TPL? Any suggestions?
If you're worried about too many threads, you can just set ParallelOptions.MaxDegreeOfParallelism to an acceptable number in your call to Parallel.ForEach.

Resources