What is the proper usage pattern for LINQ to Lucene's Index<T>?
It implements IDisposible so I figured wrapping it in a using statement would make the most sense:
IEnumerable<MyDocument> documents = null;
using (Index<MyDocument> index = new Index<MyDocument>(new System.IO.DirectoryInfo(IndexRootPath)))
{
documents = index.Where(d => d.Name.Like("term")).ToList();
}
I am occasionally experiencing unwanted deleting of the index on disk. It seems happen 100% of the time if multiple instances of the Index exist at the same time. I wrote a test using PLINQ to run 2 searches in parallel and 1 search works while the other returns 0 results because the index is emptied.
Am I supposed to use a single static instance instead?
Should I wrap it in a Lazy<T>?
Am I then opening myself up to other issues when multiple users access the static index at the same time?
I also want to re-index periodically as needed, likely using another process like a Windows service. Am I also going to run into issues if users are searching while the index is being rebuilt?
The code looks like Linq-to-Lucene.
Most cases of completely cleared Lucene indexes are new IndexWriters created with the create parameter set to true. The code in the question does not handle indexing so debugging this further is difficult.
Lucene.Net is thread-safe, and I expect linq-to-lucene to also inhibit this behavior. A single static index instance would cache stuff in memory, but I guess you'll need to handle index reloading of changes yourself (I do not know if linq-to-lucene does this for you).
There should be no problems using several searchers/readers when reindexing, Lucene is build to support that scenario. However, there can only be one writer per directory, so no other process can write documents to the index while your windows service were to optimize the index.
Related
We appear to have a problem with MDriven generating the same ECO_ID for multiple objects. For the most part it seems to happen in conjunction with unexpected process shutdowns and/or server shutdowns, but it does also happen during normal activity.
Our system consists of one ASP.NET application and one WinForms application. The ASP.NET app is setup in IIS to use a single worker process. We have a mixture of WebForms and MVC, including ApiControllers. We're using a rather old version of the ECO packages: 7.0.0.10021. We're on VS 2017, target framework is 4.7.1.
We have it configured to use 64 bit integers for object id:s. Database is Firebird. SQL configuration is set to use ReadCommitted transaction isolation.
As far as I can tell we have configured EcoSpaceStrategyHandler with EcoSpaceStrategyHandler.SessionStateMode.Never, which should mean that EcoSpaces are not reused at all, right? (Why would I even use EcoSpaceStrategyHandler in this case, instead of just creating EcoSpace normally with the new keyword?)
We have created MasterController : Controller and MasterApiController : ApiController classes that we use for all our controllers. These have a EcoSpace property that simply does this:
if (ecoSpace == null)
{
if (ecoSpaceStrategyHandler == null)
ecoSpaceStrategyHandler = new EcoSpaceStrategyHandler(
EcoSpaceStrategyHandler.SessionStateMode.Never,
typeof(DiamondsEcoSpace),
null,
false
);
ecoSpace = (DiamondsEcoSpace)ecoSpaceStrategyHandler.GetEcoSpace();
}
return ecoSpace;
I.e. if no strategy handler has been created, create one specifying no pooling and no session state persisting of eco spaces. Then, if no ecospace has been fetched, fetch one from the strategy handler. Return the ecospace. Is this an acceptable approach? Why would it be better than simply doing this:
if (ecoSpace = null)
ecoSpace = new DiamondsEcoSpace();
return ecoSpace;
In aspx we have a master page that has an EcoSpaceManager. It has been configured to use a pool but SessionStateMode is Never. It has EnableViewState set to true. Is this acceptable? Does it mean that EcoSpaces will be pooled but inactivated between round trips?
It is possible that we receive multiple incoming API calls in tight succession, so that one API call hasn't been completed before the next one comes in. I assume that this means that multiple instances of MasterApiController can execute simultaneously but in separate threads. There may of course also be MasterController instances executing MVC requests and also the WinForms app may be running some batch job or other.
But as far as I understand id reservation is made at the beginning of any UpdateDatabase call, in this way:
update "ECO_ID" set "BOLD_ID" = "BOLD_ID" + :N;
select "BOLD_ID" from "ECO_ID";
If the returned value is K, this will reserve N new id:s ranging from K - N to K - 1. Using ReadCommitted transactions everywhere should ensure that the update locks the id data row, forcing any concurrent save operations to wait, then fetches the update result without interference from other transactions, then commits. At that point any other pending save operation can proceed with its own id reservation. I fail to see how this could result in the same ID being used for multiple objects.
I should note that it does seem like it sometimes produces id duplicates within one single UpdateDatabase, i.e. when saving a set of new related objects, some of them end up with the same id. I haven't really confirmed this though.
Any ideas what might be going on here? What should I look for?
The issue is most likely that you use ReadCommitted isolation.
This allows for 2 systems to simultaneously start a transaction, read the current value, increase the batch, and then save after each other.
You must use Serializable isolation for key generation; ie only read things not currently in a write operation.
MDriven use 2 settings for isolation level UpdateIsolationLevel and FetchIsolationLevel.
Set your UpdateIsolationLevel to Serializable
What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}
I just need suggestion in this case. There is a PIN code field in my project in asp.net environment. I have stored 50,000 around pin code in sql server database. When I run project in local host, it becomes slow down. Since I have a drop-down to get value from database. I think it is because of huge data is being rendered into html, since when I click on view source at run-time, I can see all the PIN-code inside it.
Moreover, I have also done this for Select CITY, and STATE from database in a same way.
I will really appreciate you, if you get me any logic or technique to lessen this slowdown
If you are using all the Pincode in the single page then You have multiple option to optimized this slow down If this is in initialized phase then Try MongoDB ,No SQL DB otherwise go for Solr , Redis that gives fast accessing of the data. If you are not able to using these then You can optimised it by eager loading , Cache Storing of data.
If its not in single page then break it to batch via paginate the pincode.
This is common problem with any website where we deal with large amount of data. To be frank there is no code level solution for this. You need to select any of following approach.
You can try multiple options for faster retrieval.
Caching -
Use redis or memcache - in simpler words, on the first request cache manager will read and store your data from SQL server. For subsequent requests, data will be served from cache.
Also, don't forget to make a provision to invalidate the data when new pin codes are added.
Edit: You can also use object caching provided by .Net framework. Refer: object caching
Code will be something like.
if (Cache["key_pincodes"] == null)
{
// if No object is present in Cache, add it to the cache with expiry time of 10 minutes
// Read data to datatable or any object
DataTable pinCodeObject = GetPinCodesFromdatabase();
Cache.Insert("key_pincodes", pinCodeObject, null, DateTime.MaxValue, TimeSpan.FromMinutes(10));
}
else // If pinCodes are cached, dont make Database call and read it from cache
{
// This will get execute
DataTable pinCodeObject = (DataTable)Cache["key_pincodes"];
}
// bind it your dropdown
No-sql database-
MongoDB, XML, Txt files could be used to read the data. It will take much lesser time than the database hit.
In Meteor, I got a collection that the client subscribes to. In some cases, instead of publishing the documents that exists in the collection on the server, I want to send down some bogus data. Now that's fine using the this.added function in the publish.
My problem is that I want to treat the bogus doc as if it were a real document, specifically this gets troublesome when I want to update it. For the real docs I run a RealDocs.update but when doing that on the bogus doc it fails since there is no representation of it on the server (and I'd like to keep it that way).
A collection API that allowed me to pass something like local = true this would be fantastic but I have no idea how difficult that would be to implement and I'm not to fond of modifying the core code.
Right now I'm stuck at either creating a BogusDocs = new Meteor.Collection(null) but that makes populating the Collection more difficult since I have to either hard code fixtures in the client code or use a method to get the data from the server and I have to make sure I call BogusDocs.update instead of RealDocs.update as soon as I'm dealing with bogus data.
Maybe I could actually insert the data on the server and make sure it's removed later, but the data really has nothing to do with the server side collection so I'd rather avoid that.
Any thoughts on how to approach this problem?
After some further investigation (the evented mind site) it turns out that one can modify the local collection without making calls to the server. This is done by running the same methods as you usually would, but on MyCollection._collection instead of just on Collection. MyCollection.update() would thus become MyCollection._collection.update(). So, using a simple wrapper one can pass in the usual arguments to a update call to update the collection as usual (which will try to call the server which in turn will trigger your allow/deny rules) or we can add 'local' as the last argument to only perform the update in the client collection. Something like this should do it.
DocsUpdateWrapper = function() {
var lastIndex = arguments.length -1;
if (arguments[lastIndex] === 'local') {
Docs._collection.update(arguments.slice(0, lastIndex);
} else {
Docs.update(arguments)
}
}
(This could of course be extended to a DocsWrapper that allows for insertion and removals too.)(Didnt try this function yet but it should serve well as an example.)
The biggest benefit of this is imo that we can use the exact same calls to retrieve documents from the local collection, regardless of if they are local or living on the server too. By adding a simple boolean to the doc we can keep track of which documents are only local and which are not (An improved DocsWrapper could check for that bool so we could even omit passing the 'local' argument.) so we know how to update them.
There are some people working on local storage in the browser
https://github.com/awwx/meteor-browser-store
You might be able to adapt some of their ideas to provide "fake" documents.
I would use the transform feature on the collection to make an object that knows what to do with itself (on client). Give it the corruct update method (real/bogus), then call .update rather than a general one.
You can put the code from this.added into the transform process.
You can also set up a local minimongo collection. Insert on callback
#FoundAgents = new Meteor.Collection(null, Agent.transformData )
FoundAgents.remove({})
Meteor.call 'Get_agentsCloseToOffer', me, ping, (err, data) ->
if err
console.log JSON.stringify err,null,2
else
_.each data, (item) ->
FoundAgents.insert item
Maybe this interesting for you as well, I created two examples with native Meteor Local Collections at meteorpad. The first pad shows an example with plain reactive recordset: Sample_Publish_to_Local-Collection. The second will use the collection .observe method to listen to data: Collection.observe().
I've a requirement of creating a HttpHandler that will serve an image file (simple static file) and also it'll insert a record in the SQL Server table. (e.g http://site/some.img, where some.img being a HttpHandler) I need an in-memory object (like Generic List object) that I can add items to on each request (I also have to consider a few hundreds or thousands requests per second) and I should be able unload this in-memory object to sql table using SqlBulkCopy.
List --> DataTable --> SqlBulkCopy
I thought of using the Cache object. Create a Generic List object and save it in the HttpContext.Cache and insert every time a new Item to it. This will NOT work as the CacheItemRemovedCallback would fire right away when the HttpHandler tries to add a new item. I can't use Cache object as in-memory queue.
Anybody can suggest anything? Would I be able to scale in the future if the load is more?
Why would CacheItemRemovedCalledback fire when you ADD something to the queue? That doesn't make sense to me... Even if that does fire, there's no requirement to do anything here. Perhaps I am misunderstanding your requirements?
I have quite successfully used the Cache object in precisely this manner. That is what it's designed for and it scales pretty well. I stored a Hashtable which was accessed on every app page request and updated/cleared as needed.
Option two... do you really need the queue? SQL Server will scale pretty well also if you just want to write directly into the DB. Use a shared connection object and/or connection pooling.
How about just using the Generic List to store requests and using different thread to do the SqlBulkCopy?
This way storing requests in the list won't block the response for too long, and background thread will be able to update the Sql on it's own time, each 5 min so.
you can even base the background thread on the Cache mechanism by performing the work on CacheItemRemovedCallback.
Just insert some object with remove time of 5 min and reinsert it at the end of the processing work.
Thanks Alex & Bryan for your suggestions.
Bryan: When I try to replace the List object in the Cache for the second request (now, count should be 2), the CacheItemRemovedCalledback gets fire as I'm replacing the current Cache object with the new one. Initially, I also thought this is weird behavior so I gotta look deeper into it.
Also, for the second suggestion, I will try to insert record (with the Cached SqlConnection object) and see what performance I get when I do the stress test. I doubt I'll be getting fantastic numbers as it's I/O operation.
I'll keep digging on my side for an optimal solution meanwhile with your suggestions.
You can create a conditional requirement within the callback to ensure you are working on a cache entry that has been hit from an expiration instead of a remove/replace (in VB since I had it handy):
Private Shared Sub CacheRemovalCallbackFunction(ByVal cacheKey As String, ByVal cacheObject As Object, ByVal removalReason As Web.Caching.CacheItemRemovedReason)
Select Case removalReason
Case Web.Caching.CacheItemRemovedReason.Expired, Web.Caching.CacheItemRemovedReason.DependencyChanged, Web.Caching.CacheItemRemovedReason.Underused
' By leaving off Web.Caching.CacheItemRemovedReason.Removed, this will exclude items that are replaced or removed explicitly (Cache.Remove) '
End Select
End Sub
Edit Here it is in C# if you need it:
private static void CacheRemovalCallbackFunction(string cacheKey, object cacheObject, System.Web.Caching.CacheItemRemovedReason removalReason)
{
switch(removalReason)
{
case System.Web.Caching.CacheItemRemovedReason.DependencyChanged:
case System.Web.Caching.CacheItemRemovedReason.Expired:
case System.Web.Caching.CacheItemRemovedReason.Underused:
// This excludes the option System.Web.Caching.CacheItemRemovedReason.Removed, which is triggered when you overwrite a cache item or remove it explicitly (e.g., HttpRuntime.Cache.Remove(key))
break;
}
}
To expand on my previous comment... I get the picture you are thinking about the cache incorrectly. If you have an object stored in the Cache, say a Hashtable, any update/storage into that Hashtable will be persisted without you explicitly modifying the contents of the Cache. You only need to add the Hashtable to the Cache once, either at application startup or on the first request.
If you are worried about the bulkcopy and page request updates happening simultaneously, then I suggest you simple have TWO cached lists. Have one be the list which is updated as page requests come in, and one list for the bulk copy operation. When one bulk copy is finished, swap the lists and repeat. This is similar to double-buffering video RAM for video games or video apps.