I would like to know how to use a collection in LHS of a rule (to be used in contains, memberof) , which can be managed in Guvnor containing big list of elements (may be tens of thousands). Take a blacklist match for an example, how can i maintain the big blacklist in guvnor very efficiently ?
any ideas?
Models are something like this:
declare MyItem
end
and
declare MyList extends ArrayList
end
You need to use the formula for the field. Click the list, then New Formula and write
this contains myItem
Or when using memberOf. Again use formula now for the MyItem
this memberOf myList
You could just insert each blacklist item into the working memory. This is more efficient than using memberOf or contains. It also makes it easier to write rules using Guvnor. If the List contains strings you can do this:
declared ListItem
name:String
end
You can use Drools to insert the rules from the list (you can also do this before firing the rules in Java code):
rule "Just to empty the list"
when
list:List()
then
for(String name:list)
insert(Item(name));
end
Now you can write the rules against the black list items.
First of all, I would recommend testing this before you start optimising your application design. It could be a complete non-issue!
However, if you have a huge blacklist, then inserting it into the working memory might be the slow step. The fact insertion will lead to it being propagated through the Rete network, and depending on how many rules depend on its contents, this could be a slow process. Once the Rete network has been updated, subsequent evaluations should be quick.
Therefore, depending on how your rules/sessions operate, one technique to optimise performance could be to create a long-running session and insert the blacklist on start-up. With this, you could insert a 'request' fact and see what fires, then retract the request, so working memory is ready for the next request.
Unfortunately, this does raise a few other issues. For instance, depending on your rules, you may need to synchronize access to the session, rather than sharing it. This leads to a situation where responses are fast, but must also wait for one another to finish, which could be a scalability issue. If that were to be a problem, you could create a pool of sessions to support simultaneous querying by multiple users.
Related
How would one validate a unique constraint using DDD? Let's say that an Entity has a property name that must be unique among the system and there is a specific EntityRepository method nameExists(name): bool... This is what I found people suggests to do, because the repository is the abstraction of the collection of all the Entityies and should be able to perform this check.
So before creating/adding the new Entity the command / domain service could check for the existence of a newName against the repository, but I think that this will not always work because of concurrency.
In a concurrent scenario where two transactions are started simultaneously, the EntityRepository's nameExists method might return false for both transactions, and as a result of this two entries with the same name will be incorrectly inserted.
I am sure that I am missing something basic, but the answers I found all point to the repository exists method - TBH others say that a UNIQUE constraint should be put on the DB to catch the concurrency case, but what if one uses Event Sourcing or a persistence layer that does not have unique constraints?
| Follow up question |
What if the uniqueness constraint is to be applied in different levels of a hierarchy?
A Container's name must be unique in the system and then Child names must be unique inside a Container.
Let's say that a transactional DB takes care of the uniqueness at the lowest possible level, what about the domain?
Should I still express the uniqueness logic at the domain level, e.g. with a Domain Service for the system-level uniqueness and embedding Child entities inside the Container entity and having a business rule (and therefore making Container the aggregate root)?
Or should I not bother with "replicating" the uniqueness in the domain and (given there are no other rules to apply between the two) split Container and Child? Will the domain lack expressiveness then?
I am sure that I am missing something basic
Not something basic.
The term we normally use for enforcing a constraint, like uniqueness, across a set of entities is set validation. Greg Young calls your attention to a specific question:
What is the business impact of having a failure
Most set constraints fall into one of two categories
constraints that need to be true when the system reaches steady state, but may not hold while work is in progress. In business processes, these are often handled by detecting conflicts in the stored data, and then invoking various mitigation processes to resolve the conflict.
constraints that need to be true always.
The first category includes things like double booking a seat on an airplane; it's not necessarily a problem unless both people show up, and even then you can handle it by bumping someone to another seat, or another flight.
In these cases, you make a best effort - you look at a recent copy of the set, make sure there are no conflicts there, then hope for the best (accepting that some percentage of the time, you'll have missed a change).
See Memories, Guesses and Apologies (Pat Helland, 2007).
Second category is the hard one; to ensure the invariant holds you have to lock the entire set to ensure that races don't allow two different writers to insert conflicting information.
Relational databases tend to be really good at set validation - putting the entire set into a single database is going to be the right answer (note the assumption that the set is small enough to fit into a single database -- trying to lock two databases at the same time is hard).
Another possibility is to ensure that only one writer can update the set at any given time -- you don't have to worry about a losing a race when you are the only one running in it.
Sometimes you can lock a smaller set -- imagine, for example, having a collection of locks with numbers, and the hash code for the name tells you which lock you have to grab.
This simplest version of this is when you can use the name as the aggregate identifier itself.
if one uses Event Sourcing or a persistence layer that does not have unique constraints?
Sometimes, you introduce a persistent store dedicated to the set, just to ensure that you can maintain the invariant. See "microservices".
But if you can't change the database, and you can't use a database with the locking guarantees that you need, and the business absolutely has to have the set valid at all times... then you single thread that part of the work.
Everybody that wants to change a name puts a request into a queue, and the one thread responsible for managing the invariant certifies each and every change.
There's no magic; just hard work and trade offs.
We want to use Riak's Links to create a doubly linked list.
The algorithm for it is quite simple, I believe:
Let 'N0' be the new element to insert
Get the head of the list, including its 'next' link (N1)
Set the 'previous' of N1 to be the N0.
Set the 'next' of N0 to be N1
Set the 'next' of the head of the list to be N0.
The problem that we have is that there is an obvious race condition here, because if 2 concurrent clients get the head of the list, one of the items will likely be 'lost'. Any way to avoid that?
Riak is an eventually consistent system when talking about CAP theorem.
Provided you set the bucket property allow_multi=true, if two concurrent clients get the head of the list then write, you will have sibling records. On your next read you'll receive multiple values (siblings) and will then have to resolve the conflict and write the result. Given that we don't have any sort of atomicity this will possibly lead to additional conflicts under heavy write concurrency as you attempt to update the linked objects. Not impossible to resolve, but definitely tricky.
You're probably better off simply serializing the entire list into a single object. This makes your conflict resolution much simpler.
I have several graphs. The breadth and depth of each graph can vary and will undergo changes and alterations during runtime. See example graph.
There is a root node to get a hold on the whole graph (i.e. tree). A node can have several children and each child serves a special purpose. Furthermore a node can access all its direct children in order to retrieve certain informations. On the other hand a child node may not be aware of its own parent node, nor other siblings. Nothing spectacular so far.
Storing each graph and updating it with an object database (in this case DB4O) looks pretty straightforward. I could have used a relational database to accomplish data persistence (including database triggers, etc.) but I wanted to realize it with an object database instead.
There is one peculiar thing with my graphs. See another example graph.
To properly perform calculations some nodes require informations from other nodes. These other nodes may be siblings, children/grandchildren or related in some other kind. In this case a specific node knows the other relevant nodes as well (and thus can get the required informations directly from them). For the sake of simplicity the first image didn't show all potential connections.
If one node has a change of state (e.g. triggered by an internal timer or triggered by some other node) it will inform other nodes (interested obsevers, see also observer pattern) about the change. Each informed node will then take appropriate actions to update its own state (and in turn inform other observers as needed). A root node will not know about every change that occurs, since only the involved nodes will know that something has changed. If such a chain of events is triggered by the root node then of course it's not much of an issue.
The aim is to assure data persistence with an object database. Data in memory should be in sync with data stored within the database. What adds to the complexity is the fact that the graphs don't consist of simple (and stupid) data nodes, but that lots of functionality is integrated in each node (i.e. events that trigger state changes throughout a graph).
I have several rough ideas on how to cope with the presented issue (e.g. (1) stronger separation of data and functionality or (2) stronger integration of the database or (3) set an arbitrary time interval to update data and accept that data may be out of synch for a period of time). I'm looking for some more input and options concerning such a key issue (which will definitely leave significant footprints on a concrete implementation).
(edited)
There is another aspect I forgot to mention. A graph should not reside all the time in memory. Graphs that are not needed will be only present in the database and thus put in a state of suspension. This is another issue which needs consideration. While in suspension the update mechanisms will probably be put to sleep as well and this is not intended.
In the case of db4o check out "transparent activation" to automatically load objects on demand as you traverse the graph (this way the graph doesn't have to be all in memory) and check out "transparent persistence" to allow each node to persist itself after a state change.
http://www.gamlor.info/wordpress/2009/12/db4o-transparent-persistence/
Moreover you can use db4o "callbacks" to trigger custom behavior during db4o operations.
HTH
German
What's the exact question? Here a few comments:
As #German already mentioned: For complex object graphs you probably want to use transparent persistence.
Also as #German mentione: Callback can help you to do additional stuff when objects are read/written etc on the database.
To the Observer-Pattern. Are you on .NET or Java? Usually you don't want to store the observers in the database, since the observers are usually some parts of your business-logic, GUI etc. On .NET events are automatically not stored. On Java make sure that you mark the field holding the observer-references as transient.
In case you actually want to store observers, for example because they are just other elements in your object-graph. On .NET, you cannot store delegates / closures. So you need to introduce a interface for calling the observer. On Java: Often we use anonymous inner classes as listener: While db4o can store those, I would NOT recommend that. Because a anonymous inner class gets generated name which can change. Then db4o will not find that class later if you've changed your code.
Thats it. Ask more detailed questions if you want to know more.
I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.
Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.
Does anyone have any info on the best practices or conventions for achieving this?
My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.
Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.
You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.
List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.
Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.
This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).
Yes, PUT creates/overwrites, but does not partially update.
If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.
You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.
For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.
As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.
If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.
The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.
You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:
{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}
Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).
This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).
Best way is :
Pass Only Id Array of Deletable Objects from Front End Application To Web API
2. Then You have Two Options:
2.1 Web API Way : Find All Collections/Entities using Id arrays and Delete in API , but you need to take care of Dependant entities like Foreign Key Relational Table Data too
2.2. Database Way : Pass Ids to your database side, find all records in Foreign Key Tables and Primary Key Tables and Delete in same order i.e. F-Key Table records then P-Key Table records
I've got an ASP.NET page that has a bunch of controls that need to be populated (e.g. dropdown lists).
I'd like to make a single trip to the db and bring back multiple recordsets instead of making a round-trip for each control.
I could bring back multiple tables in a DataSet, or I could bring back a DataReader and use '.NextResult' to put each result set into a custom business class.
Will I likely see a big enough performance advantage using the DataReader approach, or should I just use the DataSet approach?
Any examples of how you usually handle this would be appreciated.
If you have more than 1000 record to bring from your DataBase.
If you are not very interested with
custom storing and custom paging
"For GridView"
If your server have a memory stress.
If there is no problem to connect to
your DataBase every time that page
called.
Then i think the better is to use DataReader.
else
If you have less than 1000 record to bring from your DataBase.
If you are interested with
storing and paging "For
GridView"
If your server haven't a memory
stress.
If you want to connect to your
DataBase just one time and get the
benefits of Caching.
Then i think the better is to use DataSet.
I hop that i'm right.
Always put your data into classes defined for the specific usage. Don't pass DataSets or DataReaders around.
If your stored proc returns multiple sets, use the DataReader.NextResult to advance to the next chunk of data. This way you can get all your data, load it to your objects, and close the reader as soon as possible. This will be the fastest method to get your data.
Seeing that no answer has been marked yet even though there are plently of good answers already, I thought I'd add by two bits as well.
I'd use DataReaders, as they are quite a bit faster (if performance is your thing or you need as much as you can get). Most projects I've worked on have millions of records in each of the tables and performance is a concern.
Some people have said it's not a good idea to send DataReader across layers. I personally don't see this as a problem since a "DbDataReader" is not technically tied (or doesn't have to be) to a database. That is you can create an instance of a DbDataReader without the need for a database.
Why I do it is for the following reasons:
Frequently (in a Web Application) you are generating either Html, or Xml or JSON or some other transformation of your data. So why go from a DaraReader to some POCO object only to transform it back to XML or JSON and send it down the wire. This kind of process typically requires 3 transformations and a boot load of object instantiations only to throw them away almost instantly.
In some situations that's fine or can't be helped. My Data layer typically surfaces two methods for each stored procedure I have in the system. One returns a DbDataReader and the other returns a DataSet/DataTable. The methods that return a DataSet/DataTable call the method returning a DbDataReader and then uses either the "Load" method of the DataTable or an adapter to fill the dataset. Sometimes you need DataSets because you probably have to repurpose the data in some way or you need to fire another query and unnless you have MARS enabled you can't have a DbDataReader open and fire another query.
Now there are some issues with using DbDataReader or DataSet/DataTable and that is typically code clarity, compile time checking etc. You can use wrapper classes for your datareader and in fact you can use your DataReaders a IEnumerable with them. Really cool capability. So not only do you get strong typing and code readability you also get IEnumerable!
So a class might look like this.
public sealed class BlogItemDrw : BaseDbDataReaderWrapper
{
public Int64 ItemId { get { return (Int64)DbDataReader[0]; } }
public Int64 MemberId { get { return (Int64)DbDataReader[1]; } }
public String ItemTitle { get { return (String)DbDataReader[2]; } }
public String ItemDesc { get { if (DbDataReader[3] != DBNull.Value) return (String)DbDataReader[3]; else return default(String); } }
public DateTime ItemPubdate { get { return (DateTime)DbDataReader[4]; } }
public Int32 ItemCommentCnt { get { return (Int32)DbDataReader[5]; } }
public Boolean ItemAllowComment { get { return (Boolean)DbDataReader[6]; } }
public BlogItemDrw()
:base()
{
}
public BlogItemDrw(DbDataReader dbDataReader)
:base(dbDataReader)
{
}
}
DataReader Wrapper
I have a blog post (link above) that goes into a lot more detail and I'll be making a source code generator for these and other DataAccess layer code.
You can use the same technique for DataTables (the code generator produces the code) so you can treat them as strongly typed DataTable without the overhead of what VS.NET provides out of the box.
Keep in mind that there is only one instance of the wrapper class. So you're not creating hundreds of instances of a class only to throw it away.
Map the DataReader to intermediate objects and then bind your controls using those objects. It can be ok to use DataSets in certain circumstances, but those are few and far between when you have strong reasons for "just getting data". Whatever you do, don't pass a DataReader to your controls to bind off of (not that you said that you were considering that).
My personal preference would be to use an ORM, but if you are going to hand roll your data access, by all means I think you should prefer mapping DataReaders to objects over using DataSets. Using the .NextResult as a way to limit yourself from hitting the database multiple times is a double edged sword however so choose wisely. You will find yourself repeating yourself if you try to create procs that always grab exactly what you need using only one call to the database. If your application is only a few pages, it is probably OK, but things can get out of control quickly. Personally I'd rather have one proc per object type and then hit the database multiple times (once for each object type) in order to maximize maintainability. This is where an ORM shines because a good one will generate Sql that will get you exactly what you want with one call in most cases.
If you are not interested in updating or deleting the records you fetched from database, I would suggest using DataReader. Basically DataSet internally uses multiple Datareaders, so DataReader should give you good performance advantage.
In almost every situation DataReaders are the best solution for reading from a database. DataReaders are faster and require less memory than DataTables or DataSets.
Also, DataSets can often lead to situations in which the OO model is broken. It's not very object oriented to be passing around relational data/schemata instead of objects that know how to manipulate that data.
So, for extensibility, scalability, modularity, and performance reasons, always use DataReaders if you consider yourself a Real Programmerâ„¢ ;)
Check the links for facts and discussion about the two in practice and theory.
Irrespective of whether you're fetching a single result-set or multiple result-sets, the consensus seems to be to use a DataReader instead of a DataSet.
In regards to whether you should ever bother with multiple result-sets, the wisdom is that you shouldn't, but I can conceive of a reasonable class of exceptions to that rule: (tightly) related result-sets. You certainly don't want to add a query to return the same set of choices for a drop-down list that's repeated on hundreds or even dozens of pages in your app, but several sets narrowly-used may be sensibly combined. As an example, I'm currently creating a page to display several sets of 'discrepancies' for a batch of ETL data. None of those queries are likely to be used elsewhere, so it would be convenient to encapsulate them as a single 'get discrepancies' sproc. On the other hand, the better performance of doing so may be insignificant compared to working-around the natural one-sproc-one-result-set architecture of your ORM or hand-rolled data-access code.
I have gone to a method that uses DataReaders for all calls, I have noticed a marked performance impovement, especially in cases when I am loading drop down lists, and other simple items like that.
Personally with multiple drop downs, I typically go to pullling individual chunks of data to get it, rather than say a stored procedure that returns 5 result sets.
Take a look into the TableAdapters that are available with .NET 2.0 and up. What they do is give you the strength of a strongly-typed DataTable and allow you to map a Fill method to it that will use a DataReader to load it up. Your fill method can be existing stored procedures, your own AdHoc SQL, or even let the wizard generate the AdHod or Stored Procedure for you.
You can find this by starting up a new XSD DataSet object within your project. For tables that are used for more than just lookup, you can also map insert/update/delete methods to the TableAdapter.