Is there a way to process multiple xml feeds asynchoronously? - asynchronous

I am using Atom10FeedFormatter class for processing atom xml feeds calling OData Rest API endpoint.
It works fine, but the api gives the result slow, if there are more than 200 entries in the feed.
That is what I use:
Atom10FeedFormatter formatter = new Atom10FeedFormatter();
XNamespace d = "http://schemas.microsoft.com/ado/2007/08/dataservices";
string odataurl= "http://{mysite}/_api/ProjectData/Projects";
using (XmlReader reader = XmlReader.Create(odataurl))
{
formatter.ReadFrom(reader);
}
foreach (SyndicationItem item in formatter.Feed.Items)
{
//processing the result
}
I want to speed up this process at least a little faster by splitting the original request to query the results skipping some entries and limiting entry size.
The main idea is count the number of feeds using $count, divide the feed results into blocks of 20, use the $skip and $top in the endpoint url, iterate through the results, and finally summarize them.
int countoffeeds = 500; // for the sake of simplicity, of course, i get it from the odataurl using $count
int numberofblocks = (countoffeeds/20) + 1;
for(int i = 0; i++; i<numberofblocks){
int skip = i*20;
int top = 20;
string odataurl = "http://{mysite}/_api/ProjectData/Projects"+"?&$skip="+skip+"&top=20";
Atom10FeedFormatter formatter = new Atom10FeedFormatter();
using (XmlReader reader = XmlReader.Create(odataurl))
{
formatter.ReadFrom(reader); // And this the part where I am stuck. It returns a void so I
//cannot use Task<void> and process the result later with await
}
...
Normally I would use async calls to the api (this case numberofblocks = 26 calls in parallel), but I do not know how would I do that. formatter.ReadFrom returns void, thus I can not use it with Task.
How can I solve this, and how can I read multiple xml feeds at the same time?

Normally I would use async calls to the api (this case numberofblocks = 26 calls in parallel), but I do not know how would I do that. formatter.ReadFrom returns void, thus I can not use it with Task.
Atom10FeedFormatter is a very dated type at this point, and it doesn't support asynchrony. Nor is it likely to be updated to support asynchrony.
How can I solve this, and how can I read multiple xml feeds at the same time?
Since you're stuck in the synchronous world, you do have the option of using "fake asynchrony". This just means you would do the synchronous blocking work on a thread pool thread, and treat each of those operations as though they were asynchronous. I.e.:
var tasks = new List<Task<Atom10FeedFormatter>>();
for(int i = 0; i++; i<numberofblocks) {
int skip = i*20;
int top = 20;
tasks.Add(Task.Run(() =>
{
string odataurl = "http://{mysite}/_api/ProjectData/Projects"+"?&$skip="+skip+"&top=20";
Atom10FeedFormatter formatter = new Atom10FeedFormatter();
using (XmlReader reader = XmlReader.Create(odataurl))
{
formatter.ReadFrom(reader);
return formatter;
}
}));
}
var formatters = await Task.WhenAll(tasks);

Related

Async method reading mysql table produce list with duplicated elements

Async method returns list with duplicated elements. It is async method which is using mysql connector to connect with database. Then it execute query (SELECT *) and by using MySqlDataReader - I save and add to list rows until last ReadAsync() call.
Asynchronous programming is still black magic for me - I would appreciate any feedback or indicating unlogical code lines with explanation.
This method will be used in my web api controller and method purpose is to return all entries from 'Posts' table. Code is working fine when I 'reset' temp object each loop by using
temp = new Post(); but I assume it is unacceptable ? What if my database would have not 15 but 15000 entries?
`
public async Task<List<Post>> GetPostsAsync()
{
List<Post> posts = new List<Post>();
Post temp = new Post();
try
{
await _context.conn.OpenAsync();
MySqlCommand cmd = new MySqlCommand("USE idunnodb; SELECT * FROM Posts;", _context.conn);
await using MySqlDataReader reader = await cmd.ExecuteReaderAsync();
while(await reader.ReadAsync())
{
temp.PostID = (int)reader[0];
temp.UserID = (int)reader[1];
temp.PostDate = reader[2].ToString();
temp.PostTitle = reader[3].ToString();
temp.PostDescription = reader[4].ToString();
temp.ImagePath = reader[5].ToString();
posts.Add(temp);
}
await _context.conn.CloseAsync();
}
catch (Exception ex)
{
return Enumerable.Empty<Post>().ToList();
}
return posts;
}
`
Looks like you can only read data by looping through MySqlDataReader, according to your code logic, you need to read each piece of data, and then add them to the List one by one for output.
Your code is behaving this way because temp has been declared & instantiated outside of your reader.ReadAsync() statement, you are updating the same object reference each time around the loop which is why you are seeing repeating objects in your list.
So you need to instantiate in reader.ReadAsync() loop:
while(await reader.ReadAsync())
{
Post temp = new Post();
...
}

Blazor WebAssembly **Microsoft.JSInterop.JSException** Error: The value 'sessionStorage.length' is not a function

From a basic standpoint what I am trying to do is get a list of keys (key names) from session storage.
The way I am trying to do this is by calling the JsRuntime.InvokeAsync method to:
Get the number of keys in session storage, and
loop thought the number of items in session storage and get the key name.
public async Task<List<string>> GetKeysAsync()
{
var dataToReturn = new List<string>();
var storageLength = await JsRuntime.InvokeAsync<string>("sessionStorage.length");
if (int.TryParse(storageLength, out var slength))
{
for (var i = 1; i <= slength; i++)
{
dataToReturn.Add(await JsRuntime.InvokeAsync<string>($"sessionStorage.key({i})"));
}
}
return dataToReturn;
}
When calling the JsRuntime.InvokeAsync($"sessionStorage.length")) or JsRuntime.InvokeAsync($"sessionStorage.key(0)")) I am getting an error "The value 'sessionStorage.length' is not a function." or The value 'sessionStorage.key(0)' is not a function.
I am able to get a single items using the key name from session storage without issue like in the following example.
public async Task<string> GetStringAsync(string key)
{
return await JsRuntime.InvokeAsync<string>("sessionStorage.getItem", key);
}
When I use the .length or .key(0) in the Chrome console they work as expected, but not when using the JsRuntime.
I was able to get this to work without using the sessionStorage.length property. I am not 100% happy with the solution, but it does work as needed.
Please see below code. The main thing on the .key was to use the count as a separate variable in the InvokeAsync method.
I think the reason for this is the JsRuntime.InvokeAsync method adds the () automatically to the end of the request, so sessionStorage.length is becoming sessionStorage.length() thus will not work. sessionStorage.key(0) was becoming sessionStorage.key(0)(). etc. Just that is just a guess.
public async Task<List<string>> GetKeysAsync()
{
var dataToReturn = new List<string>();
var dataPoint = "1";
while (!string.IsNullOrEmpty(dataPoint) )
{
dataPoint = await JsRuntime.InvokeAsync<string>($"sessionStorage.key", $"{dataToReturn.Count}");
if (!string.IsNullOrEmpty(dataPoint))
dataToReturn.Add(dataPoint);
}
return dataToReturn;
}

Deleting Millions of Documents in cosmos in a reasonable amount of time

Recently I have been working a lot with Cosmos and ran in to an issue when looking at deleting documents.
I need to delete around ~40 million documents in my Cosmos Container, I've looked around quite a bit and found a few options of which i have tried. two of the fastest of which I've tried are using a stored procedure within cosmos to delete records and using a bulk executor.
Both of these options have given subpar results compared to what I am looking for. I believe this should be obtainable within a couple hours but at the moment I am getting performance of around 1 hour per million recordsT
the two methods I used can also be seen here:
Stack Overflow Post on Document Deletion
My documents are about 35 keys long where half are string values and the other half are float/integer values, if that matters, and there are around 100k records per partition.
Here is are the two examples that I am using to attempt the deletion:
This first one is using C# and the documentation that helped me with this is here:
GitHub Documentation azure-cosmosdb-bulkexecutor-dotnet-getting-started
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.CosmosDB.BulkExecutor;
using Microsoft.Azure.CosmosDB.BulkExecutor.BulkImport;
using Microsoft.Azure.CosmosDB.BulkExecutor.BulkDelete;
namespace BulkDeleteSample
{
class Program
{
private static readonly string EndpointUrl = "xxxx";
private static readonly string AuthorizationKey = "xxxx";
private static readonly string DatabaseName = "xxxx";
private static readonly string CollectionName = "xxxx";
static ConnectionPolicy connectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp
};
static async Task Main(string[] args)
{
DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey, connectionPolicy);
DocumentCollection dataCollection = GetCollectionIfExists(client, DatabaseName, CollectionName);
// Set retry options high during initialization (default values).
client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 30;
client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 9;
BulkExecutor bulkExecutor = new BulkExecutor(client, dataCollection);
await bulkExecutor.InitializeAsync();
// Set retries to 0 to pass complete control to bulk executor.
client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 0;
client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 0;
List<Tuple<string, string>> pkIdTuplesToDelete = new List<Tuple<string, string>>();
for (int i = 0; i < 99999; i++)
{
pkIdTuplesToDelete.Add(new Tuple<string, string>("1", i.ToString()));
}
BulkDeleteResponse bulkDeleteResponse = await bulkExecutor.BulkDeleteAsync(pkIdTuplesToDelete);
}
static DocumentCollection GetCollectionIfExists(DocumentClient client, string databaseName, string collectionName)
{
return client.CreateDocumentCollectionQuery(UriFactory.CreateDatabaseUri(databaseName))
.Where(c => c.Id == collectionName).AsEnumerable().FirstOrDefault();
}
}
}
The second one is using a stored procedure I found which delete data from a given partition using a query, of which I am running via a python notebook.
Here is the stored procedure:
/**
* A Cosmos DB stored procedure that bulk deletes documents for a given query.
* Note: You may need to execute this sproc multiple times (depending whether the sproc is able to delete every document within the execution timeout limit).
*
* #function
* #param {string} query - A query that provides the documents to be deleted (e.g. "SELECT c._self FROM c WHERE c.founded_year = 2008"). Note: For best performance, reduce the # of properties returned per document in the query to only what's required (e.g. prefer SELECT c._self over SELECT * )
* #returns {Object.<number, boolean>} Returns an object with the two properties:
* deleted - contains a count of documents deleted
* continuation - a boolean whether you should execute the sproc again (true if there are more documents to delete; false otherwise).
*/
function bulkDeleteSproc(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
I'm not sure if I am doing anything wrong or it the performance just isn't there with cosmos but I'm finding it quite difficult to achieve what I'm looking for, any advice is greatly appreciated.

ServiceStack OrmLite - Elegant way to handle SQL Server Connection Drops

We are currently using ORMLite and it is working really well.
One of the places that we are using it is for running large batch processes.
These processes run a single large batch all within a single transaction, if there are any errors then it rolls back the transaction and then it needs to be run again.
Is there a way that something like a connection drop(which could be very quick) could be better handled and that it could then, just re-establish the connection and then re-continue from there?
The only thing that's resembles something close to what you're after is using a Custom OrmLite Exec Fitler which you can use to inject your own custom Execution strategy.
The example on OrmLite's home page shows an example of using an Exec filter to execute each query 3 times:
public class ReplayOrmLiteExecFilter : OrmLiteExecFilter
{
public int ReplayTimes { get; set; }
public override T Exec<T>(IDbConnection dbConn, Func<IDbCommand, T> filter)
{
var holdProvider = OrmLiteConfig.DialectProvider;
var dbCmd = CreateCommand(dbConn);
try
{
var ret = default(T);
for (var i = 0; i < ReplayTimes; i++)
{
ret = filter(dbCmd);
}
return ret;
}
finally
{
DisposeCommand(dbCmd);
OrmLiteConfig.DialectProvider = holdProvider;
}
}
}
OrmLiteConfig.ExecFilter = new ReplayOrmLiteExecFilter { ReplayTimes = 3 };
using (var db = OpenDbConnection())
{
db.DropAndCreateTable<PocoTable>();
db.Insert(new PocoTable { Name = "Multiplicity" });
var rowsInserted = db.Count<PocoTable>(x => x.Name == "Multiplicity"); //3
}
But it uses the same IDbConnection, i.e. it doesn't create a new DB Connection.

Java API and code for querying TSDB with millisecond resolution

I have to query the TSDB with millisecond precision.By default when using the TDSBQuery (obtained via TSDB.newQuery()) this requirement is not fulfilled. I tried with TSQuery but it didn't work. The returned results are far more than the ones returned by HTTP query API (that brings the right data that I need):
http://localhost:9999/api/query?start=1197849600.001&end=1197849882.000&msResolution=true&m=sum:CP101_X&ms.
Can you please help on how to achieve the millisecond precision query by using Java API?
Thanks,
Regards,
Florin
I have used the following Java code:
TSQuery q = getMetricForValidate();
q.validateAndSetQuery();
Query[] queries = q.buildQueries(tsdb);
for (int i = 0; i < queries.length; i++)
{
DataPoints[] dps = queries[i].run();
if (dps.length > 0)
{
for (DataPoint dp : dps[0])
{
System.out.println(dp.timestamp());
}
}
}
private static TSQuery getMetricForValidate()
{
final TSQuery query = new TSQuery();
query.setStart("119784960.0001");
query.setEnd("1197849882.000");
query.setMsResolution(true);
final ArrayList<TSSubQuery> subs = new ArrayList<TSSubQuery>(1);
subs.add(getSubMetricForValidate());
query.setQueries(subs);
return query;
}
public static TSSubQuery getSubMetricForValidate()
{
final TSSubQuery sub = new TSSubQuery();
sub.setAggregator("sum");
sub.setMetric("01CP101_X");
sub.setRate(false);
return sub;
}
While debugging the TSDMain class, I found out that the filtering at millisecond is not performed at the database level, but in client code in HTTPJsonSerializer. I believe that this is the current supported implementation. Are any other solution at horizon?
Regards,
Florin

Resources