Can consuming a CosmosDB FeedIterator result in multiple continuation tokens? - azure-cosmosdb

So I'm performing a query on a CosmosDB container on behalf of a client and want to support returning continuation tokens to the caller. Right now my query against CosmosDB based off the examples in the official documentation looks like:
FeedIterator<T> feedIterator = container.GetItemQueryIterator<T>(queryDefinition, continuationToken, queryOptions);
// class MyResponse {
// List<T> Objects;
// string? ContinuationToken
// }
var myResponse = new MyResponse();
while (feedIterator.HasMoreResults)
{
FeedResponse<T> response = await feedIterator.ReadNextAsync();
myResponse.ContinuationToken = response.ContinuationToken;
foreach (var result in response)
{
myResponse.Objects.add(result);
}
}
So the documentation opted to use a while loop to consume the FeedIterator, implying it is possible this block is run multiple times and as a result we can consume more than one FeedResponse. Each FeedResponse has a ContinuationToken property, so it seems possible to have multiple ContinuationTokens. If so, is it guaranteed the latest response will have the most up to date continuation token?

Yes; each newer continuation token is set on your response object, overwriting the previous continuation token.
I'm not sure why you have a while loop, though. It seems like you would just want to call ReadNextAsync once per request.

Related

What happens to a SemaphoreSlim when you dereference it?

I'm running into a problem sending massive requests to a .NET Core web service. I'm using a SemaphoreSlim to limit the number of simultaneous requests. When I get a 10061 error (the web service has refused the connection), I want to dial back the number of simultaneous requests. My idea at the moment is to de-reference the SemaphoreSlim and create another:
await this.semaphoreSlim.WaitAsync().ConfigureAwait(false);
counter++;
Uri uri = new Uri($"{api}/{keyProperty}", UriKind.Relative);
string rowVersion = string.Empty;
try
{
HttpResponseMessage getResponse = await this.httpClient.GetAsync(uri).ConfigureAwait(false);
if (getResponse.IsSuccessStatusCode)
{
using (HttpContent httpContent = getResponse.Content)
{
JObject currentObject = JObject.Parse(await httpContent.ReadAsStringAsync().ConfigureAwait(false));
rowVersion = currentObject.Value<string>("rowVersion");
}
}
}
catch (HttpRequestException httpRequestException)
{
SocketException socketException = httpRequestException.InnerException as SocketException;
if (socketException != null && socketException.ErrorCode == PutHandler.ConnectionRefused)
{
this.semaphoreSlim = new SemaphoreSlim(counter * 90 / 100, counter * 90 / 100);
}
}
}
finally
{
this.semaphoreSlim.Release();
}
If I do this, what will happen to the other tasks that are waiting on the Semaphore that I just de-referenced? My guess is that nothing will happen until the object is garbage collected and disposed.
A SemaphoreSlim (just like any other object in .NET) will exist as long as there are references to it.
However, there is a bug in your code: the SemaphoreSlim being released is this.semaphoreSlim, and if this.semaphoreSlim is changed between being acquired and being released, then the code will release a different semaphore than the one that was acquired. To avoid this problem, copy this.semaphoreSlim into a local variable at the beginning of your method, and acquire and release that local variable.
More broadly, there's a difficult in the attempted solution. If you start 1000 tasks, they will all reference the old semaphore and ignore the updated this.sempahoreSlim. So you'd need a separate solution. For example, you could define a disposable "token" which is permission to call the API. Then have an asynchronous collection of these tokens (e.g., a Channel). This gives you full control over how many tokens are released at once.

Parallel httprequest in UWP app

I'm creating an app that requires todo parallel http request, I'm using HttpClient for this.
I'm looping over the urls and foreach URl I start a new Task todo the request.
after the loop I wait untill every task finishes.
However when I check the calls being made with fiddler I see that the request are being called synchronously. It's not like a bunch of request are being made, but one by one.
I've searched for a solution and found that other people have experienced this too, but not with UWP. The solution was to increase the DefaultConnectionLimit on the ServicePointManager.
The problem is that ServicePointManager does not exist for UWP. I've looked in the API's and I thought I could set the DefaultConnectionLimit on HttpClientHandler, but no.
So I have a few Questions.
Is DefaultConnectionLimit still a property that could be set somewhere?
if so, where do i set it?
if not, how do I increase the connnectionlimit?
Is there still a connectionlimit in UWP?
this is my code:
var requests = new List<Task>();
var client = GetHttpClient();
foreach (var show in shows)
{
requests.Add(Task.Factory.StartNew((x) =>
{
((Show)x).NextEpisode = GetEpisodeAsync(((Show)x).NextEpisodeUri, client).Result;}, show));
}
}
await Task.WhenAll(requests.ToArray());
and this is the request:
public async Task<Episode> GetEpisodeAsync(string nextEpisodeUri, HttpClient client)
{
try
{
if (String.IsNullOrWhiteSpace(nextEpisodeUri)) return null;
HttpResponseMessage content; = await client.GetAsync(nextEpisodeUri);
if (content.IsSuccessStatusCode)
{
return JsonConvert.DeserializeObject<EpisodeWrapper>(await content.Content.ReadAsStringAsync()).Episode;
}
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
return null;
}
Oke. I have the solution. I do need to use async/await inside the task. The problem was the fact I was using StartNew instead of Run. but I have to use StartNew because i'm passing along a state.
With the StartNew. The task inside the task is not awaited for unless you call Unwrap. So Task.StartNew(.....).Unwrap(). This way the Task.WhenAll() will wait untill the inner task is complete.
When u are using Task.Run() you don't have to do this.
Task.Run vs Task.StartNew
The stackoverflow answer
var requests = new List<Task>();
var client = GetHttpClient();
foreach (var show in shows)
{
requests.Add(Task.Factory.StartNew(async (x) =>
{
((Show)x).NextEpisode = await GetEpisodeAsync(((Show)x).NextEpisodeUri, client);
}, show)
.Unwrap());
}
Task.WaitAll(requests.ToArray());
I think an easier way to solve this is not "manually" starting requests but instead using linq with an async delegate to query the episodes and then set them afterwards.
You basically make it a two step process:
Get all next episodes
Set them in the for each
This also has the benefit of decoupling your querying code with the sideeffect of setting the show.
var shows = Enumerable.Range(0, 10).Select(x => new Show());
var client = new HttpClient();
(Show, Episode)[] nextEpisodes = await Task.WhenAll(shows
.Select(async show =>
(show, await GetEpisodeAsync(show.NextEpisodeUri, client))));
foreach ((Show Show, Episode Episode) tuple in nextEpisodes)
{
tuple.Show.NextEpisode = tuple.Episode;
}
Note that i am using the new Tuple syntax of C#7. Change to the old tuple syntax accordingly if it is not available.

Scatter / Gather using Rebus

I have a requirement to a batch a number of web service calls on the receipt of a single message appearing in a (MSMQ) queue.
Is "sagas" the way to go?
The interaction with the 3rd party web service is further complicated because I need to call it once, then poll subsequently for an acknowledgement with a correlation id, returned in the reply to the initial call to the web service.
Yes, sagas could be the way to coordinate the process of initiating the call, polling until the operation has ended, and then doing something else when all the work is done.
If you don't care too much about accidentally making the web service call more than once, you can easily use Rebus' async capabilities to implement the polling - I am currently in the process of building something that basically does this:
public async Task Handle(SomeMessage message)
{
var response = await _client.Get<SomeResponse>("https://someurl") ;
var pollUrl = response.PollUrl;
var resultUrl = response.ResultUrl;
while(true)
{
var result = await _client.Get<PollResult>(pollUrl);
if (result.Status == PollStatus.Processing)
{
await Task.Delay(TimeSpan.FromSeconds(2));
continue;
}
if (result.Status == PollStatus.Done)
{
var finalResult = await _client.Get<FinalResult>(resultUrl);
return new SomeReply(finalResult);;
}
throw new Exception($"Unexpected status while polling {pollUrl}: {result.Status}")
}
}
thus taking advantage of async/await to poll the external webservice while it is processing, while consuming minimal resources in our end.

Handle large number of PUT requests to a rest api

I have been trying to find a way to make this task more efficient. I am consuming a REST based web service and need to update information for over 2500 clients.
I am using fiddler to watch the requests, and I'm also updating a table with an update time when its complete. I'm getting about 1 response per second. Are my expectations to high? I'm not even sure what I would define as 'fast' in this context.
I am handling everything in my controller and have tried running multiple web requests in parallel based on examples around the place but it doesn't seem to make a difference. To be honest I don't understand it well enough and was just trying to get it to build. I suspect it is still waiting for each request to complete before firing again.
I have also increased connections in my web config file as per another suggestion with no success:
<system.net>
<connectionManagement>
<add address="*" maxconnection="20" />
</connectionManagement>
</system.net>
My Controllers action method looks like this:
public async Task<ActionResult> UpdateMattersAsync()
{
//Only get matters we haven't synced yet
List<MatterClientRepair> repairList = Data.Get.AllUnsyncedMatterClientRepairs(true);
//Take the next 500
List<MatterClientRepair> subRepairList = repairList.Take(500).ToList();
FinalisedMatterViewModel vm = new FinalisedMatterViewModel();
using (ApplicationDbContext db = new ApplicationDbContext())
{
int jobCount = 0;
foreach (var job in subRepairList)
{
// If not yet synced - it shouldn't ever be!!
if (!job.Synced)
{
jobCount++;
// set up some Authentication fields
var oauth = new OAuth.Manager();
oauth["access_token"] = Session["AccessToken"].ToString();
string uri = "https://app.com/api/v2/matters/" + job.Matter;
// prepare the json object for the body
MatterClientJob jsonBody = new MatterClientJob();
jsonBody.matter = new MatterForUpload();
jsonBody.matter.client_id = job.NewClient;
string jsonString = jsonBody.ToJSON();
// Send it off. It returns the whole object we updated - we don't actually do anything with it
Matter result = await oauth.Update<Matter>(uri, oauth["access_token"], "PUT", jsonString);
// update our entities
var updateJob = db.MatterClientRepairs.Find(job.ID);
updateJob.Synced = true;
updateJob.Update_Time = DateTime.Now;
db.Entry(updateJob).State = System.Data.Entity.EntityState.Modified;
if (jobCount % 50 == 0)
{
// save every 50 changes
db.SaveChanges();
}
}
}
// if there are remaining files to save
if (jobCount % 50 != 0)
{
db.SaveChanges();
}
return View("FinalisedMatters", Data.Get.AllMatterClientRepairs());
}
}
And of course the Update method itself which handles the Web requesting:
public async Task<T> Update<T>(string uri, string token, string method, string json)
{
var authzHeader = GenerateAuthzHeader(uri, method);
// prepare the token request
var request = (HttpWebRequest)WebRequest.Create(uri);
request.Headers.Add("Authorization", authzHeader);
request.Method = method;
request.ContentType = "application/json";
request.Accept = "application/json, text/javascript";
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(json);
request.ContentLength = bytes.Length;
System.IO.Stream os = request.GetRequestStream();
os.Write(bytes, 0, bytes.Length);
os.Close();
WebResponse response = await request.GetResponseAsync();
using (var reader = new System.IO.StreamReader(response.GetResponseStream()))
{
return JsonConvert.DeserializeObject<T>(reader.ReadToEnd());
}
}
If it's not possible to do more than 1 request per second then I'm interested in looking at an Ajax solution so I can give the user some feedback while it is processing. In my current solution I cannot give the user feedback while the action method hasn't reached 'return' yet can I?
Okay it's taken me a few days (and a LOT of trial and error) but I've worked this out. Hopefully it can help others. I finally found my silver bullet. And it was probably the place I should have started:
MSDN: Consuming the Task-based Asynchronous Pattern
In the end this following line of code is what brought it all to light.
string [] pages = await Task.WhenAll(from url in urls select DownloadStringAsync(url));
I substituted a few things to make it work for a Put request as follows:
HttpResponseMessage[] results = await Task.WhenAll(from p in toUpload select client.PutAsync(p.uri, p.jsonContent));
'toUpload' is a List of MyClass:
public class MyClass
{
// the URI should be relative to the base pase
// (ie: /api/v2/matters/101)
public string uri { get; set; }
// a string in JSON format, being the body of the PUT request
public StringContent jsonContent { get; set; }
}
The key was to stop trying to put my PutAsync method inside a loop. My new line of code IS still blocking until ALL responses have come back, but that is what I wanted. Also, learning that I could use this LINQ style expression to create a Task List on the fly was immeasurably helpful. I won't post all the code (unless someone wants it) because it's not as nicely refactored as the original and I still need to check whether the response of each item was 200 OK before I record it as successfully saved in my database. So how much faster is it?
Results
I tested a sample of 50 web service calls from my local machine. (There is some saving of records to a SQL Database in Azure at the end).
Original Synchronous Code: 70.73 seconds
Asynchronous Code: 8.89 seconds
That's gone from 1.4146 requests per second down to a mind melting 0.1778 requests per second! (if you average it out)
Conclusion
My journey isn't over. I've just scratched the surface of asynchronous programming and am loving it. I need to now work out how to save only the results that have returned 200 OK. I can deserialize the HttpResponse which returns a JSON object (which has a unique ID I can look up etc.) OR I could use the Task.WhenAny method, and experiment with Interleaving.

Some questions concerning the combination of ADO.NET, Dapper QueryAsync and Glimpse.ADO

I have been experimenting with a lightweight solution for handling my business logic. It consists of a vanilla ADO.NET connection that is extended with Dapper, and monitored by Glimpse.ADO. The use case for this setup will be a web application that has to process a handful of queries asynchronously per request. Below a simple implementation of my setup in an MVC controller.
public class CatsAndDogsController : Controller
{
public async Task<ActionResult> Index()
{
var fetchCatsTask = FetchCats(42);
var fetchDogsTask = FetchDogs(true);
await Task.WhenAll(fetchCatsTask, fetchDogsTask);
ViewBag.Cats = fetchCatsTask.Result;
ViewBag.Dogs = fetchDogsTask.Result;
return View();
}
public async Task<IEnumerable<Cat>> FetchCats(int breedId)
{
IEnumerable<Cat> result = null;
using (var connection = CreateAdoConnection())
{
await connection.OpenAsync();
result = await connection.QueryAsync<Cat>("SELECT * FROM Cat WHERE BreedId = #bid;", new { bid = breedId });
connection.Close();
}
return result;
}
public async Task<IEnumerable<Dog>> FetchDogs(bool isMale)
{
IEnumerable<Dog> result = null;
using (var connection = CreateAdoConnection())
{
await connection.OpenAsync();
result = await connection.QueryAsync<Dog>("SELECT * FROM Dog WHERE IsMale = #im;", new { im = isMale });
connection.Close();
}
return result;
}
public System.Data.Common.DbConnection CreateAdoConnection()
{
var sqlClientProviderFactory = System.Data.Common.DbProviderFactories.GetFactory("System.Data.SqlClient");
var dbConnection = sqlClientProviderFactory.CreateConnection();
dbConnection.ConnectionString = "SomeConnectionStringToAwesomeData";
return dbConnection;
}
}
I have some questions concerning the creation of the connection in the CreateAdoConnection() method. I assume the following is happening behind the scenes.
The call to sqlClientProviderFactory.CreateConnection() returns an instance of System.Data.SqlClient.SqlConnection passed as a System.Data.Common.DbConnection. At this point Glimpse.ADO.AlternateType.GlimpseDbProviderFactory kicks in and wraps this connection in an instance of Glimpse.Ado.AlternateType.GlimpseDbConnection, which is also passed as a System.Data.Common.DbConnection. Finally, this connection is indirectly extended by the Dapper library with its query methods, among them the QueryAsync<>() method used to fetch the cats and dogs.
The questions:
Is the above assumption correct?
If I use Dapper's async methods with this connection - or create a System.Data.Common.DbCommand with this connection's CreateCommand() method, and use it's async methods - will those calls internally always end up using the vanilla async implementations of these methods as Microsoft has written them for System.Data.SqlClient.SqlConnection and System.Data.SqlClient.SqlCommand? And not some other implementations of these methods that are actually blocking?
How much perf do I lose with this setup compared to just returning a new System.Data.SqlClient.SqlConnection directly? (So, without the Glimpse.ADO wrapper)
Any suggestions on improving this setup?
Yes pretty much. GlimpseDbProviderFactory wraps/decorates/proxies all the registered factories. We then pass any calls we get through to the factory we wrap (in this case SQL Server). In the case of CreateConnection() we ask the inner factory we have, to create a connection, when we get that connection, we wrap it and then return it to the originating caller
Yes. Glimpse doesn't turn what was an async request into a blocking request. We persevere the async chain all the way though. If you are interested, the code in question is here.
Very little. In essence, using a decorator pattern like this adds only one or two frames to the call stack. Compared to most operations performed during the request lifecycle, the time to observe whats happening here is extremely minimal.
What you have looks great. Only suggestion is to maybe us this code to build the factory. This code means that you can shift your connection string, etc to the web.config.

Resources