How to make an HttpClient GetAsync wait for a webpage that loads data asynchronously? - .net-core

I'm making a snippet that sends data to a website that analyses it then sends back results.
Is there any way to make my GetAsych wait until the website finishes its calculation before getting a "full response"?
Ps: The await will not know if the page requested contains any asynchronous processing (eg: xhr calls)- I already use await and ReadAsByteArrayAsync()/ReadAsStringAsync()
Thank you!

You will need something like Selenium to not only fetch the HTML of the website but to fully render the page and execute any dynamic scripts.
You can then hook into some events, wait for certain DOM elements to appear or just wait some time until the page is fully initialized.
Afterwards you can use the API of Selenium to access the DOM and extract the information you need.
Example code:
using (var driver = new ChromeDriver(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)))
{
driver.Navigate().GoToUrl(#"https://automatetheplanet.com/multiple-files-page-objects-item-templates/");
var link = driver.FindElement(By.PartialLinkText("TFS Test API"));
var jsToBeExecuted = $"window.scroll(0, {link.Location.Y});";
((IJavaScriptExecutor)driver).ExecuteScript(jsToBeExecuted);
var wait = new WebDriverWait(driver, TimeSpan.FromMinutes(1));
var clickableElement = wait.Until(ExpectedConditions.ElementToBeClickable(By.PartialLinkText("TFS Test API")));
clickableElement.Click();
}
Source: https://www.automatetheplanet.com/webdriver-dotnetcore2/

What you're looking for here is the await operator. According to the docs:
The await operator suspends evaluation of the enclosing async method until the asynchronous operation represented by its operand completes.
Sample use within the context of an HttpClient object:
public static async Task Main()
{
// send the HTTP GET request
var response = await httpClient.GetAsync("my-url");
// get the response string
// there are other `ReadAs...()` methods if the return type is not a string
var getResult = response.Content.ReadAsStringAsync();
}
Note that the method that encloses the await-ed code is marked as async and has a return type of Task (Task<T> would also work, depending on your needs).

Related

Apify cheerio scraper stops even with urls in the queue

here is the scenario, I'm using the cheerio scraper to scraper a website containing real estate announces.
Each announce has the link to the next announce so before scrapint the current page I add the next page in the request queue.
What it happens always at certain and a random point is that the scraper stops without any reason, even if in the queue there is the next page to scrape (I add the image).
Why does this happens since there is still a pending request in the queue?
Many thanks
Here is the message I get:
2021-02-28T10:52:35.439Z INFO CheerioCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2021-02-28T10:52:35.672Z INFO CheerioCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":963,"requestsFinishedPerMinute":50,"requestsFailedPerMinute":0,"requestTotalDurationMillis":22143,"requestsTotal":23,"crawlerRuntimeMillis":27584,"requestsFinished":23,"requestsFailed":0,"retryHistogram":[23]}
2021-02-28T10:52:35.679Z INFO Cheerio Scraper finished.
Here the request queue:
Here the code
async function pageFunction(context) {
const { $, request, log } = context;
// The "$" property contains the Cheerio object which is useful
// for querying DOM elements and extracting data from them.
const pageTitle = $('title').first().text();
// The "request" property contains various information about the web page loaded.
const url = request.url;
// Use "log" object to print information to actor log.
log.info('Scraping Page', { url, pageTitle });
// Adding next page to the queue
var baseUrl = '...';
if($('div.d3-detailpager__element--next a').length > 0)
{
var nextPageUrl = $('div.d3-detailpager__element--next a').attr('href');
log.info('Found another page', { nextUrl: baseUrl.concat(nextPageUrl) });
context.enqueueRequest({ url:baseUrl.concat(nextPageUrl) });
}
// My code for scraping follows here
return { /*my scaped object*/}
}
Missing await
await context.enqueueRequest

Flutter multiple async methods for parrallel execution

I'm still struggeling with the async/await pattern so I'm here to ask you some precisions.
I saw this page explaining the async/await pattern pretty well. I'm posting here the example that bother me :
import 'dart:async';
Future<String> firstAsync() async {
await Future<String>.delayed(const Duration(seconds: 2));
return "First!";
}
Future<String> secondAsync() async {
await Future<String>.delayed(const Duration(seconds: 2));
return "Second!";
}
Future<String> thirdAsync() async {
await Future<String>.delayed(const Duration(seconds: 2));
return "Third!";
}
void main() async {
var f = await firstAsync();
print(f);
var s = await secondAsync();
print(s);
var t = await thirdAsync();
print(t);
print('done');
}
In this example, each async method is called one after another, so the execution time for the main function is 6 seconds (3 x 2 seconds). However, I don't understand what's the point of asynchronous function if they are executed one after another.
Are async functions not supposed to execute in the background ? Is it not the point of multiple async functions to fastens the process with parrallel execution ?
I think I'm missing something about asynchronous functions and async/await pattern in flutter so if you could explain me that, it would be very appreciated.
Best
Waiting on multiple Futures to complete using Future.wait()
If the order of execution of the functions is not important, you can use Future.wait().
The functions get triggered in quick succession; when all of them complete with a value, Future.wait() returns a new Future. This Future completes with a list containing the values produced by each function.
Future
.wait([firstAsync(), secondAsync(), thirdAsyncC()])
.then((List responses) => chooseBestResponse(responses))
.catchError((e) => handleError(e));
or with async/await
try {
List responses = await Future.wait([firstAsync(), secondAsync(), thirdAsyncC()]);
} catch (e) {
handleError(e)
}
If any of the invoked functions completes with an error, the Future returned by Future.wait() also completes with an error. Use catchError() to handle the error.
Resource:https://v1-dartlang-org.firebaseapp.com/tutorials/language/futures#waiting-on-multiple-futures-to-complete-using-futurewait
The example is designed to show how you can wait for a long-running process without actually blocking the thread. In practice, if you have several of those that you want to run in parallel (for example: independent network calls), you could optimize things.
Calling await stops the execution of the method until the future completes, so the call to secondAsync will not happen until firstAsync finishes, and so on. If you do this instead:
void main() async {
var f = firstAsync();
var s = secondAsync();
var t = thirdAsync();
print(await f);
print(await s);
print(await t);
print('done');
}
then all three futures are started right away, and then you wait for them to finish in a specific order.
It is worth highlighting that now f, s, and t have type Future<String>. You can experiment with different durations for each future, or changing the order of the statements.
If anyone new in this problem use the async . Dart has a function called FutureGroup. You can use it to run futures in parallel.
Sample:
final futureGroup = FutureGroup();//instantiate it
void runAllFutures() {
/// add all the futures , this is not the best way u can create an extension method to add all at the same time
futureGroup.add(hello());
futureGroup.add(checkLocalAuth());
futureGroup.add(hello1());
futureGroup.add(hello2());
futureGroup.add(hello3());
// call the `.close` of the group to fire all the futures,
// once u call `.close` this group cant be used again
futureGroup.close();
// await for future group to finish (all futures inside it to finish)
await futureGroup.future;
}
This futureGroup has some useful methods which can help you ie. .future etc.. check the documentation to get more info.
Here's a sample usage Example One using await/async and Example Two using Future.then.
you can always use them in a single future
final results = await Future.wait([
firstAsync();
secondAsync();
thirdAsync();
]);
results will be an array of you return type. in this case array of strings.
cheers.
Try this resolve.
final List<Future<dynamic>> featureList = <Future<dynamic>>[];
for (final Partner partner in partnerList) {
featureList.add(repository.fetchAvatar(partner.uid));
}
await Future.wait<dynamic>(featureList);
If want parallel execution you should switch to multi thread concept called Isolates
mix this with async/await concepts . You can also check this website for more
https://buildflutter.com/flutter-threading-isolates-future-async-and-await/
Using async / await like that is useful when you need a resource before executing the next task.
In your example you don't do really useful things, but imagine you call firstAsync, that gives you a stored authorization token in your phone, then you call secondAsync giving this token get asynchronously and execute an HTTP request and then checking the result of this request.
In this case you don't block the UI thread (user can interact with your app) and other tasks (get token, HTTP request...) are done in background.
i think you miss understood how flutter works first flutter is not multi threaded.....!
second if it isn't multi threaded how can it executes parallel tasks, which doesnt happen....! here is some links that will help you understand more https://webdev.dartlang.org/articles/performance/event-loop
https://www.dartlang.org/tutorials/language/futures
flutter doesn't put futures on another thread but what happens that they are added to a queue the links that i added are for event loop and how future works. hope you get it , feel free to ask me :)

Servicestack async method (v4)

at this moment I am developing an android db access to a servicestack web api.
I need to show a message of "wait please..." when the user interacts with the db, I read some documentation:
Calling from client side an async method
But I cannot found how the async method must be implemented in the webapi, serverside.
Can you please share a link to the documentation? or a simple working sample?
I have this before trying to convert it to an async call.
var response = jsonClient.Send(new UsuarioLogin { Cuenta = txtCuenta.Text, P = txtPass.Text});
After some read, the latter code is converted into this:
var response = await jsonClient.SendAsync(new UsuarioLogin { Cuenta = txtCuenta.Text, P = txtPass.Text});
So, in the server side I have this(just an extract):
public object Post(UsuarioLogin request)
{
var _usuario = usuarioRepo.Login(_cuenta, _password);
if (_usuario != null)
{
if (_usuario.UsuarioPerfilId != 2 )
{
return new UsuarioLoginResponse { Ret = -3 };
}
How can I convert it to an async method?
Thanks in advance.
Server-side and client-side async are opaque and unrelated, i.e. an async call on the client works the same way irrespective if it's calling a sync or an async service, there's also no difference for sync client either. Essentially how the server is implemented has no effect on how it's called on the client.
To make an async service on the Server you just need to return a Task, you can look at the AsyncTaskTests for different examples of how to create an async Service.
There are a lot of benefits for making non-blocking async calls on the client since this can be called from the UI thread without blocking the UI. But there's less value of server-side async which adds a lot of hidden artificial complexity, is easy to deadlock, requires rewriting I/O operations to be async and in many cases wont improve performance unless I/O is the bottleneck, e.g. async doesn't help when calling a single RDBMS. You're going to get a lot more performance benefits from using Caching instead.

async / await and Task.Run -- How to know when everything is done

My actual program is more sophisticated than this, but I've tried to simplify things.
So let's say I'm reading a file with a list of URLs. I want to download the HTML from each URL and process it. The processing may be a bit complex, so I'd like it to be done on a separate thread.
The basic problem is how to tell when all the processing is done. For example, if the user tries to close the program before all URLs are processed, I want to give him a message and not exit the program. Alternatively, I want to terminate the program (perhaps with a MsgBox("Done") message) as soon as all the URLs are processed.
I'd like my code to look something as follows (assuming I've got an outer loop reading the URLs and calling this routine)...
List<Task> TaskList = new List<Task>();
async void ProcessSingleUrl(string url) {
var web = new HttpClient();
var WebPageContents = await web.GetStringAsync(url);
Task t = Task.Run(() => ProcessWebPage(WebPageContents);
TaskList.Add(t);
}
The above code should run very quickly (Async methods run pretty well instantly) and return to the caller almost immediately.
But at that point, I may well have no entries whatsoever in TaskList, since a task isn't defined until the GetStringAsync is completed, and none (or maybe just a few) may have finished by then. So
Task.WaitAll(TaskList.ToArray());
doesn't work the way I need it to.
If absolutely necessary, I could first read in all the URLs and know how many Tasks to expect, but I'm hoping for a more elegant solution.
I suppose I could increment a counter just before the await, but that feels a bit kludgy.
I assume I'm structuring things incorrectly, but I'm not sure how to reorganize things.
Note: I'm not wedded to Task.Run. Good ol' QueueWorkItem is a possibility, but I think it has pretty well the same problems.
I assume I'm structuring things incorrectly, but I'm not sure how to reorganize things.
I think that is true. Here is a possible solution: Store the whole computation as a Task in your list, not just the second part:
async Task ProcessSingleUrlInner(string url) {
var web = new HttpClient();
var WebPageContents = await web.GetStringAsync(url);
Task t = Task.Run(() => ProcessWebPage(WebPageContents);
await t;
}
void ProcessSingleUrl(string url) {
var t = ProcessSingleUrlInner(url);
TaskList.Add(t);
}
Waiting on all tasks of this list will guarantee that everything is done. Probably, you need to adapt this idea to your exact needs.
I'm assuming that you're getting the list of urls as an IEnumerable<string> or some such.
You can use LINQ to convert each url into a Task, and then await them all to complete:
async Task ProcessUrls(IEnumerable<string> urls)
{
var tasks = urls.Select(async url =>
{
var web = new HttpClient();
var WebPageContents = await web.GetStringAsync(url);
await Task.Run(() => ProcessWebPage(WebPageContents);
});
await Task.WhenAll(tasks);
}
Note that if you use this solution and there are multiple different urls that have errors, then Task.WhenAll will only report one of those errors.

How to alter object server-side before save in Meteor

I am trying to run certain insert statements after a collection has been updated. For example, if a user adds an embedded document Location to their User document, I would like to also insert that embedded document into a separate Location collection. Is there a way to do this on the server side so that the operation is guaranteed to run?
If you're willing to use some code I wrote (https://gist.github.com/matb33/5258260), you can hook in like so:
EDIT: code is now part of a project at https://github.com/matb33/meteor-collection-hooks
var test = new Meteor.Collection("test");
if (Meteor.isServer) {
test.before("insert", function (userId, doc) {
doc.created = doc.created || Date.now();
});
test.before("update", function (userId, selector, modifier, options) {
if (!modifier.$set) modifier.$set = {};
modifier.$set.modified = Date.now();
});
test.after("update", function (userId, selector, modifier, options, previous) {
doSomething();
});
}
you would need to do it in a method.. you can keep latency compensation by implementing a client-side Method stub:
Calling methods on the client defines stub functions associated with
server methods of the same name. You don't have to define a stub for
your method if you don't want to. In that case, method calls are just
like remote procedure calls in other systems, and you'll have to wait
for the results from the server.
If you do define a stub, when a client invokes a server method it will
also run its stub in parallel. On the client, the return value of a
stub is ignored. Stubs are run for their side-effects: they are
intended to simulate the result of what the server's method will do,
but without waiting for the round trip delay. If a stub throws an
exception it will be logged to the console.
see my Meteor stub example here: https://stackoverflow.com/a/13145432/1029644

Resources