We are currently using ORMLite and it is working really well.
One of the places that we are using it is for running large batch processes.
These processes run a single large batch all within a single transaction, if there are any errors then it rolls back the transaction and then it needs to be run again.
Is there a way that something like a connection drop(which could be very quick) could be better handled and that it could then, just re-establish the connection and then re-continue from there?
The only thing that's resembles something close to what you're after is using a Custom OrmLite Exec Fitler which you can use to inject your own custom Execution strategy.
The example on OrmLite's home page shows an example of using an Exec filter to execute each query 3 times:
public class ReplayOrmLiteExecFilter : OrmLiteExecFilter
{
public int ReplayTimes { get; set; }
public override T Exec<T>(IDbConnection dbConn, Func<IDbCommand, T> filter)
{
var holdProvider = OrmLiteConfig.DialectProvider;
var dbCmd = CreateCommand(dbConn);
try
{
var ret = default(T);
for (var i = 0; i < ReplayTimes; i++)
{
ret = filter(dbCmd);
}
return ret;
}
finally
{
DisposeCommand(dbCmd);
OrmLiteConfig.DialectProvider = holdProvider;
}
}
}
OrmLiteConfig.ExecFilter = new ReplayOrmLiteExecFilter { ReplayTimes = 3 };
using (var db = OpenDbConnection())
{
db.DropAndCreateTable<PocoTable>();
db.Insert(new PocoTable { Name = "Multiplicity" });
var rowsInserted = db.Count<PocoTable>(x => x.Name == "Multiplicity"); //3
}
But it uses the same IDbConnection, i.e. it doesn't create a new DB Connection.
Related
Recently I have been working a lot with Cosmos and ran in to an issue when looking at deleting documents.
I need to delete around ~40 million documents in my Cosmos Container, I've looked around quite a bit and found a few options of which i have tried. two of the fastest of which I've tried are using a stored procedure within cosmos to delete records and using a bulk executor.
Both of these options have given subpar results compared to what I am looking for. I believe this should be obtainable within a couple hours but at the moment I am getting performance of around 1 hour per million recordsT
the two methods I used can also be seen here:
Stack Overflow Post on Document Deletion
My documents are about 35 keys long where half are string values and the other half are float/integer values, if that matters, and there are around 100k records per partition.
Here is are the two examples that I am using to attempt the deletion:
This first one is using C# and the documentation that helped me with this is here:
GitHub Documentation azure-cosmosdb-bulkexecutor-dotnet-getting-started
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.CosmosDB.BulkExecutor;
using Microsoft.Azure.CosmosDB.BulkExecutor.BulkImport;
using Microsoft.Azure.CosmosDB.BulkExecutor.BulkDelete;
namespace BulkDeleteSample
{
class Program
{
private static readonly string EndpointUrl = "xxxx";
private static readonly string AuthorizationKey = "xxxx";
private static readonly string DatabaseName = "xxxx";
private static readonly string CollectionName = "xxxx";
static ConnectionPolicy connectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp
};
static async Task Main(string[] args)
{
DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey, connectionPolicy);
DocumentCollection dataCollection = GetCollectionIfExists(client, DatabaseName, CollectionName);
// Set retry options high during initialization (default values).
client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 30;
client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 9;
BulkExecutor bulkExecutor = new BulkExecutor(client, dataCollection);
await bulkExecutor.InitializeAsync();
// Set retries to 0 to pass complete control to bulk executor.
client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 0;
client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 0;
List<Tuple<string, string>> pkIdTuplesToDelete = new List<Tuple<string, string>>();
for (int i = 0; i < 99999; i++)
{
pkIdTuplesToDelete.Add(new Tuple<string, string>("1", i.ToString()));
}
BulkDeleteResponse bulkDeleteResponse = await bulkExecutor.BulkDeleteAsync(pkIdTuplesToDelete);
}
static DocumentCollection GetCollectionIfExists(DocumentClient client, string databaseName, string collectionName)
{
return client.CreateDocumentCollectionQuery(UriFactory.CreateDatabaseUri(databaseName))
.Where(c => c.Id == collectionName).AsEnumerable().FirstOrDefault();
}
}
}
The second one is using a stored procedure I found which delete data from a given partition using a query, of which I am running via a python notebook.
Here is the stored procedure:
/**
* A Cosmos DB stored procedure that bulk deletes documents for a given query.
* Note: You may need to execute this sproc multiple times (depending whether the sproc is able to delete every document within the execution timeout limit).
*
* #function
* #param {string} query - A query that provides the documents to be deleted (e.g. "SELECT c._self FROM c WHERE c.founded_year = 2008"). Note: For best performance, reduce the # of properties returned per document in the query to only what's required (e.g. prefer SELECT c._self over SELECT * )
* #returns {Object.<number, boolean>} Returns an object with the two properties:
* deleted - contains a count of documents deleted
* continuation - a boolean whether you should execute the sproc again (true if there are more documents to delete; false otherwise).
*/
function bulkDeleteSproc(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
I'm not sure if I am doing anything wrong or it the performance just isn't there with cosmos but I'm finding it quite difficult to achieve what I'm looking for, any advice is greatly appreciated.
I currently have a problem with the datareader when creating Microsoft.SqlServer.Management.Smo.Table asynchronously. Note: I derived my SmoTable from TableView and IDisposable.
private async Task Generate()
{
await Task.Run(()=>
{
MSSMSDatabase db = CreateDB(txtDBname.Text);
List<string> tableNames = GetTableNameList();
for(string tableName in tableNames)
{
using(SmoTable tbl = new Table(db, tableName)) // <=== after a few loops, the error occurs within here.
{
foreach(var col in columnList)
{
tbl.AddColumns(col);
}
tbl.Create();
}
}
});
}
Microsoft.SqlServer.Management.Smo.FailedOperationException: InvalidOperationException: There is already an open DataReader associated with this Connection which must be closed first.
I tried implementing IDisposable to my SmoTable class that I derived from the TableView class but still have the same error.
Thanks in advance.
I did a trial and error and found out that you need to create a new connection for each table creation to create a separate datareader for it. So, if you include the instantiation of Server in the foreach loop it will create a new connection and hence a new datareader.
for(string tableName in tableNames)
{
using(SmoTable tbl = new Table(db, tableName)) // <=== after a few loops, the error occurs within here.
{
foreach(var col in columnList)
{
_server = GetSQLServer(); // <=== this is basically Server server = new Server(); return server; kind of method.
db = _server.Databases[_databaseName];
tbl.AddColumns(col);
}
tbl.Create();
}
}
I have a custom PXbutton called UploadRecords, when I click this button I should populate the grid with records and release the records.
Release Action is pressed in the UploadRecords action delegate. The problem I get with this code is, the code here function properly for less records by release action but when passes thousands of records to release, it takes huge time(> 30 min.) and show the error like Execution timeout.
suggest me to avoid more execution time and release the records fastly.
namespace PX.Objects.AR
{
public class ARPriceWorksheetMaint_Extension : PXGraphExtension<ARPriceWorksheetMaint>
{
//public class string_R112 : Constant<string>
//{
// public string_R112()
// : base("4E5CCAFC-0957-4DB3-A4DA-2A24EA700047")
// {
// }
//}
public class string_R112 : Constant<string>
{
public string_R112()
: base("EA")
{
}
}
public PXSelectJoin<InventoryItem, InnerJoin<CSAnswers, On<InventoryItem.noteID, Equal<CSAnswers.refNoteID>>,
LeftJoin<INItemCost, On<InventoryItem.inventoryID, Equal<INItemCost.inventoryID>>>>,
Where<InventoryItem.salesUnit, Equal<string_R112>>> records;
public PXAction<ARPriceWorksheet> uploadRecord;
[PXUIField(DisplayName = "Upload Records", MapEnableRights = PXCacheRights.Select, MapViewRights = PXCacheRights.Select)]
[PXButton]
public IEnumerable UploadRecord(PXAdapter adapter)
{
using (PXTransactionScope ts = new PXTransactionScope())
{
foreach (PXResult<InventoryItem, CSAnswers, INItemCost> res in records.Select())
{
InventoryItem invItem = (InventoryItem)res;
INItemCost itemCost = (INItemCost)res;
CSAnswers csAnswer = (CSAnswers)res;
ARPriceWorksheetDetail gridDetail = new ARPriceWorksheetDetail();
gridDetail.PriceType = PriceTypeList.CustomerPriceClass;
gridDetail.PriceCode = csAnswer.AttributeID;
gridDetail.AlternateID = "";
gridDetail.InventoryID = invItem.InventoryID;
gridDetail.Description = invItem.Descr;
gridDetail.UOM = "EA";
gridDetail.SiteID = 6;
InventoryItemExt invExt = PXCache<InventoryItem>.GetExtension<InventoryItemExt>(invItem);
decimal y;
if (decimal.TryParse(csAnswer.Value, out y))
{
y = decimal.Parse(csAnswer.Value);
}
else
y = decimal.Parse(csAnswer.Value.Replace(" ", ""));
gridDetail.CurrentPrice = y; //(invExt.UsrMarketCost ?? 0m) * (Math.Round(y / 100, 2));
gridDetail.PendingPrice = y; // (invExt.UsrMarketCost ?? 0m)* (Math.Round( y/ 100, 2));
gridDetail.TaxID = null;
Base.Details.Update(gridDetail);
}
ts.Complete();
}
Base.Document.Current.Hold = false;
using (PXTransactionScope ts = new PXTransactionScope())
{
Base.Release.Press();
ts.Complete();
}
List<ARPriceWorksheet> lst = new List<ARPriceWorksheet>
{
Base.Document.Current
};
return lst;
}
protected void ARPriceWorksheet_RowSelected(PXCache cache, PXRowSelectedEventArgs e, PXRowSelected InvokeBaseHandler)
{
if (InvokeBaseHandler != null)
InvokeBaseHandler(cache, e);
var row = (ARPriceWorksheet)e.Row;
uploadRecord.SetEnabled(row.Status != SPWorksheetStatus.Released);
}
}
}
First, Do you need them all to be in a single transaction scope? This would revert all changes if there is an exception in any. If you need to have them all committed without any errors rather than each record, you would have to perform the updates this way.
I would suggest though moving your process to a custom processing screen. This way you can load the records, select one or many, and use the processing engine built into Acumatica to handle the process, rather than a single button click action. Here is an example: https://www.acumatica.com/blog/creating-custom-processing-screens-in-acumatica/
Based on the feedback that it must be all in a single transaction scope and thousands of records, I can only see two optimizations that may assist. First is increasing the Timeout as explained in this blog post. https://acumaticaclouderp.blogspot.com/2017/12/acumatica-snapshots-uploading-and.html
Next I would load all records into memory first and then loop through them with a ToList(). That might save you time as it should pull all records at once rather than once for each record.
going from
foreach (PXResult<InventoryItem, CSAnswers, INItemCost> res in records.Select())
to
var recordList = records.Select().ToList();
foreach (PXResult<InventoryItem, CSAnswers, INItemCost> res in recordList)
I have different plugins in my Web api project with their own XML docs, and have one centralized Help page, but the problem is that Web Api's default Help Page only supports single documentation file
new XmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/Documentation.xml"))
How is it possible to load config from different files? I wan to do sth like this:
new XmlDocumentationProvider("PluginsFolder/*.xml")
You can modify the installed XmlDocumentationProvider at Areas\HelpPage to do something like following:
Merge multiple Xml document files into a single one:
Example code(is missing some error checks and validation):
using System.Xml.Linq;
using System.Xml.XPath;
XDocument finalDoc = null;
foreach (string file in Directory.GetFiles(#"PluginsFolder", "*.xml"))
{
if(finalDoc == null)
{
finalDoc = XDocument.Load(File.OpenRead(file));
}
else
{
XDocument xdocAdditional = XDocument.Load(File.OpenRead(file));
finalDoc.Root.XPathSelectElement("/doc/members")
.Add(xdocAdditional.Root.XPathSelectElement("/doc/members").Elements());
}
}
// Supply the navigator that rest of the XmlDocumentationProvider code looks for
_documentNavigator = finalDoc.CreateNavigator();
Kirans solution works very well. I ended up using his approach but by creating a copy of XmlDocumentationProvider, called MultiXmlDocumentationProvider, with an altered constructor:
public MultiXmlDocumentationProvider(string xmlDocFilesPath)
{
XDocument finalDoc = null;
foreach (string file in Directory.GetFiles(xmlDocFilesPath, "*.xml"))
{
using (var fileStream = File.OpenRead(file))
{
if (finalDoc == null)
{
finalDoc = XDocument.Load(fileStream);
}
else
{
XDocument xdocAdditional = XDocument.Load(fileStream);
finalDoc.Root.XPathSelectElement("/doc/members")
.Add(xdocAdditional.Root.XPathSelectElement("/doc/members").Elements());
}
}
}
// Supply the navigator that rest of the XmlDocumentationProvider code looks for
_documentNavigator = finalDoc.CreateNavigator();
}
I register the new provider from HelpPageConfig.cs:
config.SetDocumentationProvider(new MultiXmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/")));
Creating a new class and leaving the original one unchanged may be more convenient when upgrading etc...
Rather than create a separate class along the lines of XmlMultiDocumentationProvider, I just added a constructor to the existing XmlDocumentationProvider. Instead of taking a folder name, this takes a list of strings so you can still specify exactly which files you want to include (if there are other xml files in the directory that the Documentation XML are in, it might get hairy). Here's my new constructor:
public XmlDocumentationProvider(IEnumerable<string> documentPaths)
{
if (documentPaths.IsNullOrEmpty())
{
throw new ArgumentNullException(nameof(documentPaths));
}
XDocument fullDocument = null;
foreach (var documentPath in documentPaths)
{
if (documentPath == null)
{
throw new ArgumentNullException(nameof(documentPath));
}
if (fullDocument == null)
{
using (var stream = File.OpenRead(documentPath))
{
fullDocument = XDocument.Load(stream);
}
}
else
{
using (var stream = File.OpenRead(documentPath))
{
var additionalDocument = XDocument.Load(stream);
fullDocument?.Root?.XPathSelectElement("/doc/members").Add(additionalDocument?.Root?.XPathSelectElement("/doc/members").Elements());
}
}
}
_documentNavigator = fullDocument?.CreateNavigator();
}
The HelpPageConfig.cs looks like this. (Yes, it can be fewer lines, but I don't have a line limit so I like splitting it up.)
var xmlPaths = new[]
{
HttpContext.Current.Server.MapPath("~/bin/Path.To.FirstNamespace.XML"),
HttpContext.Current.Server.MapPath("~/bin/Path.To.OtherNamespace.XML")
};
var documentationProvider = new XmlDocumentationProvider(xmlPaths);
config.SetDocumentationProvider(documentationProvider);
I agree with gurra777 that creating a new class is a safer upgrade path. I started with that solution but it involves a fair amount of copy/pasta, which could easily get out of date after a few package updates.
Instead, I am keeping a collection of XmlDocumentationProvider children. For each of the implementation methods, I'm calling into the children to grab the first non-empty result.
public class MultiXmlDocumentationProvider : IDocumentationProvider, IModelDocumentationProvider
{
private IList<XmlDocumentationProvider> _documentationProviders;
public MultiXmlDocumentationProvider(string xmlDocFilesPath)
{
_documentationProviders = new List<XmlDocumentationProvider>();
foreach (string file in Directory.GetFiles(xmlDocFilesPath, "*.xml"))
{
_documentationProviders.Add(new XmlDocumentationProvider(file));
}
}
public string GetDocumentation(System.Reflection.MemberInfo member)
{
return _documentationProviders
.Select(x => x.GetDocumentation(member))
.FirstOrDefault(x => !string.IsNullOrWhiteSpace(x));
}
//and so on...
The HelpPageConfig registration is the same as in gurra777's answer,
config.SetDocumentationProvider(new MultiXmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/")));
I have code like this:
public bool Set(IEnumerable<WhiteForest.Common.Entities.Projections.RequestProjection> requests)
{
var documentSession = _documentStore.OpenSession();
//{
try
{
foreach (var request in requests)
{
documentSession.Store(request);
}
//requests.AsParallel().ForAll(x => documentSession.Store(x));
documentSession.SaveChanges();
documentSession.Dispose();
return true;
}
catch (Exception e)
{
_log.LogDebug("Exception in RavenRequstRepository - Set. Exception is [{0}]", e.ToString());
return false;
}
//}
}
This code gets called many times. After i get to around 50,000 documents that have passed through it i get an OutOfMemoryException.
Any idea why ? perhaps after a while i need to declare a new DocumentStore ?
thank you
**
UPDATE:
**
I ended up using the Batch/Patch API to perform the update I needed.
You can see the discussion here: https://groups.google.com/d/topic/ravendb/3wRT9c8Y-YE/discussion
Basically since i only needed to update 1 property on my objects, and after considering ayendes comments about re-serializing all the objects back to JSON, i did something like this:
internal void Patch()
{
List<string> docIds = new List<string>() { "596548a7-61ef-4465-95bc-b651079f4888", "cbbca8d5-be45-4e0d-91cf-f4129e13e65e" };
using (var session = _documentStore.OpenSession())
{
session.Advanced.DatabaseCommands.Batch(GenerateCommands(docIds));
}
}
private List<ICommandData> GenerateCommands(List<string> docIds )
{
List<ICommandData> retList = new List<ICommandData>();
foreach (var item in docIds)
{
retList.Add(new PatchCommandData()
{
Key = item,
Patches = new[] { new Raven.Abstractions.Data.PatchRequest () {
Name = "Processed",
Type = Raven.Abstractions.Data.PatchCommandType.Set,
Value = new RavenJValue(true)
}}});
}
return retList;
}
Hope this helps ...
Thanks alot.
I just did this for my current project. I chunked the data into pieces and saved each chunk in a new session. This may work for you, too.
Note, this example shows chunking by 1024 documents at a time, but needing at least 2000 before we decide it's worth chunking. So far, my inserts got the best performance with a chunk size of 4096. I think that's because my documents are relatively small.
internal static void WriteObjectList<T>(List<T> objectList)
{
int numberOfObjectsThatWarrantChunking = 2000; // Don't bother chunking unless we have at least this many objects.
if (objectList.Count < numberOfObjectsThatWarrantChunking)
{
// Just write them all at once.
using (IDocumentSession ravenSession = GetRavenSession())
{
objectList.ForEach(x => ravenSession.Store(x));
ravenSession.SaveChanges();
}
return;
}
int numberOfDocumentsPerSession = 1024; // Chunk size
List<List<T>> objectListInChunks = new List<List<T>>();
for (int i = 0; i < objectList.Count; i += numberOfDocumentsPerSession)
{
objectListInChunks.Add(objectList.Skip(i).Take(numberOfDocumentsPerSession).ToList());
}
Parallel.ForEach(objectListInChunks, listOfObjects =>
{
using (IDocumentSession ravenSession = GetRavenSession())
{
listOfObjects.ForEach(x => ravenSession.Store(x));
ravenSession.SaveChanges();
}
});
}
private static IDocumentSession GetRavenSession()
{
return _ravenDatabase.OpenSession();
}
Are you trying to save it all in one call?
The DocumentSession need to turn all of the objects that you pass it into a single request to the server. That means that it may allocate a lot of memory for the write to the server.
Usually we recommend on batches of about 1,024 items in you are doing bulks saves.
DocumentStore is a disposable class, so I worked around this problem by disposing the instance after each chunk. I highly doubt this is the most efficient way to run operations, but it will prevent significant memory overhead from happening.
I was running a sort of "delete all" operation like so. You can see the using blocks disposing both the DocumentStore and the IDocumentSession objects after each chunk.
static DocumentStore GetDataStore()
{
DocumentStore ds = new DocumentStore
{
DefaultDatabase = "test",
Url = "http://localhost:8080"
};
ds.Initialize();
return ds;
}
static IDocumentSession GetDbInstance(DocumentStore ds)
{
return ds.OpenSession();
}
static void Main(string[] args)
{
do
{
using (var ds = GetDataStore())
using (var db = GetDbInstance(ds))
{
//The `Take` operation will cap out at 1,024 by default, per Raven documentation
var list = db.Query<MyClass>().Skip(deleteSum).Take(5000).ToList();
deleteCount = list.Count;
deleteSum += deleteCount;
foreach (var item in list)
{
db.Delete(item);
}
db.SaveChanges();
list.Clear();
}
} while (deleteCount > 0);
}