Question in short: How can you map/calculate a derived field using JDO/DataNucleus?
Example
An Order can have one or more Items. The field totalItemAmount is the sum of all Items and their amounts. totalItemAmount should not exist as a field in the datastore, but should be calculated.
With Hibernate one could use #Formula to annotate totalItemAmount- see https://stackoverflow.com/a/2986354/2294031 .
Is there an equivalent for JDO/DataNucleus?
Workarounds
Because I have not found anything yet, I considered using alternative approaches. But I am not sure which one would be appropriate.
Implementing totalItemAmount as a method: The total amount of items could be calculated with a method (eg. Order.getTotalItemAmount()). The method iterates over all Items of the Order and sums up the amount of each Item. But I imagine this approach would be very slow if I want to display an overview of many orders. Because each time getTotalItemAmount() gets called, all Items of the Order will be (unnecessarily) fetched.
Defining a custom query: Is it possible to define a custom query, which will be used, when DataNucleus obtains Orders from the datastore?
Treating totalItemAmount as a "normal" column (like number): totalItemAmount will be an integer column and everytime the list of Items from the Order gets updated, the totalItemAmount will be updated also. But I do not like this approach, because it could lead to inconsistency - If the order gets modified outside the context (eg. using plain SQL), the content of totalItemAmount could be wrong.
Using a SQL view: I could define a view as described in Hibernate Derived Properties - Performance and Portability. But this would introduce a considerable amount of work and future maintenance - imho too much for the gain.
Is there another way to solve this problem?
Off-Topic: Feel free to comment on my writing, as I really would like to improve it.
Related
My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.
Is anyone aware of any bundle, or of any plans for Doctrine to implement a new optimistic locking strategy like the one in Telerik framework?
http://docs.telerik.com/data-access/developers-guide/crud-operations/concurrency-control/data-access-tasks-define-model-concurrency-optimistic#checking_for_any_changes
This strategy is great because it's not using version numbers, or timestamps, it compares the old values of the inputs with new values, and only for the changed inputs.
So if I have two different forms updating the same entity, using the current strategies (timestamp or versioning), 2 users will get into conflict even if the do not update the same data.
I know this is a rather old question on the interwebs, so, I'm going to glaze over how this can be done in theory.
The per column lock implies there is a mechanism for detecting per column changes - therefore either the row is hashed (or the columns we are concerned about) and that hash comparison is used to detect/track changes OR we track each column individually.
This tracking would then have to be applied on a per request basis to determine what's changed, when. This is where the normal #Version decorator logic would apply.
Telerik is able to handle the per column changes because the information is compared (array comparisons, most likely) in the browser. This is an isolated state of the data that is not in sync with other browsers/database until update.
Easiest option here is to not allow the two forms to collide (in terms of data) and remove the lock - OR - decouple the forms data from the primary table and allow them to be updated separately (and merged with a sql join).
I've started using Access 2010 recently and started testing some of the new features, namely the Calculated Field datatype.
I had hoped that this was something that based on a formula (expression builder) would remove an amount of data and shrink an ACCDB file because Access only has the formula not actual data.
However, my new version of the file seems to be larger than the original which IMHO makes the feature a bit useless.
I've searched the interweb regarding the feature and can only really find people who show how to create one rather than any pros and cons about the feature.
As it stands I'm going to go back to the old method of calculations in a query but before I do I thought I'd ask on StackOverflow just in case anybody has used it.
Access stores the results of calculated fields for each record, so yes, that will increase the size of the database. However your claim that this "makes the feature a bit useless" misses the point:
The primary advantage of using calculated fields is that the calculation (expression) is defined once, at the table level. Once the calculated field has been defined it can simply be used much like any other field in queries, reports, etc..
Sure, you can "go back to the old method of calculations in a query" if that suits your purposes, but it also means that
You will have to repeat the (same) calculation logic in all of your queries.
If the calculation logic ever changes then you'll have to go back and edit all of those queries.
Every time you run one of those queries it will have to re-do the calculation for every record, instead of simply retrieving the calculated field from the table.
What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.
I have a series of objects I have created:
Item
Order
Song
etc.
Each object has a reasonable number of properties, and I use a datareader where I pass it "SELECT * FROM .objectname." and then I fill a collection of objects, and return the collection. This works as: GetOrdersCollection(), GetSongsCollection(), etc.
I understand SELECT * to be a performance problem, and additionally, sometimes I prefer to include additional columns in the select statement which do not exist in the object, and have those all returned as well.
So my question is, what is the best way to approach this problem?
Should I create a new object for every query type?
I tried performing a check to see if column is in datareader before storing it, but this presents perf. issues. Is there a negligible perf. way to avoid IndexOutOfRange?
Should I just use Datatable and read right from the table?
I understand SELECT * to be a
performance problem,
It's not a performance problem if there are only a few columns, or you need all of the columns anyway.
1.Should I create a new object for every query type?
You should create a new object for each table, and a new method for each query type.
2.I tried performing a check to see if column is in datareader before storing
it, but this presents perf. issues. Is
there a negligible perf. way to avoid
IndexOutOfRange?
If you are referring to your fields by name rather than index, there shouldn't be any IndexOutOfRange problems. If you are referring to your fields by index, you can loop thru them where your index is less than the column Count(), and there shouldn't be any IndexOutOfRange problems.
3.Should I just use Datatable and read right from the table?
That's a perfectly good approach to start out with. Consider spending some time to learn a simple ORM as others have suggested. Subsonic is a good "first" ORM.
Performance-wise reading from a forward only data structure like DataReader is going to net you the best performance and resource conservation.
On the other hand populating object (like a OR/M does) can be negligible so long as you are not returning more than a handful of objects.
Your first step should be to profile your database and ensure that you have proper indexes. Write some tests to see where your largest time expense is in the process and optimize the target areas that cost you the most.
Are there any reasons you can't use a simple ORM generator like SubSonic? This will allow you to very easily access these types of collections, and they'll be strongly typed. You also won't have to worry about the SQL since the queries will be built by SubSonic.