What is the relationship between Collection and Stream - collections

There is Stream, which can be retrieved from Collection, in Java 8, that is public default stream<E> Collection.stream(). So I would like to express the relationship between Stream and Collection with UML for practice.
I think the proper relationship is dependency. But I am not sure about it. So I hereto would like to know what is the proper relationship between Collection and Stream in the aspect of UML? If so what is the tenet of dependency?

The Collection of E is an aggregate, and it provides a method stream() which returns a Stream of E, which uses the collection as source.
So the relationship is rather complicated: there is a <<create>> dependency from Collection to Stream. But at the same time, there is a potentially navigable association from the Stream to the Collection, although this is not visible for the outside world. By the way, you could represent both with a templateable element.
So you could have something like this in theory:
Note however, that in practice you should omit the association between the stream and the collection, because it is not usable for the outside world. This makes only sense if you're interested in the inner design of the Java class library. You'd better put a comment on the <<create>> constraint explaining in plain language that the one serves as source for the other.

Related

Do the unused properties consume size in gae datastore?

In my design I have to store a lot of properties(say a 20 properties) in a same datastore table.
But usually most of the entities will occupy a minimum of only 5 properties.
Is this design a resource consuming idea? Will the unused properties consume any memory or performance?
Thanks,
Karthick.
If I understand your question correctly, you are envisioning a system where you have: A Kind in your Datastore where the Entities for that Kind can have differing subsets of a common property-key space W. Entity 1's property set might be {W[0], W[1]}, and Entity 2's property set might be {W[1], W[2], W[5]}. You want to know whether this polymorphism (or "schemalessness") will cost you space, and whether each Entity, as in some naive MySQL implementations
The short answer is no - due to the schemaless nature of Datastore, having polymorphic entities in a kind (the entities have all different names and combinations of properties) will not consume extra space. The only way to have these "unused" properties consume extra space is if you actually did set them on the entity but set them to "null". If you are using the low-level API, you are manually adding the properties to the entity before saving it. Think of these as properties on a JSON object. If they aren't there, they aren't there.
In MySQL, having a table with many NULL-able columns can be a bad idea, depending on the engine, indexes, etc... but take a look at this talk if you want to learn more about how the Datastore actually stores it's data using BigTable. It's a different storage implementation underneath, and so there are different best practices or possibilities.

GAE Datastore kind design with a requirement of too many entity fields

One of my datastore entity is growing with too many fields, hence it could be a future performance bottle neck.
As of now I see the entity consists of 100 fields, if I need to fetch 100 entities each having 100 fields, would definitely be a performance hit (considering underlying data serialization and de-serialization while fetching the data from data-store).
So is it a good idea to convert the entire entity to a blob and store it with a key value and later logically parse the data back into a required object format?
Any valuable suggestions please?
Unless you have done some profiling and find that serialization is a real bottleneck, I wouldn't worry about how many fields you have. Object assembly and disassembly in Java is fast. In the unlikely event that you actually are hitting limits (say, thousands of entities with thousands of fields) you can write a custom Objectify translator that eliminates all the reflection overhead.
This sounds like premature optimization.
I'm not so sure if converting the entity to a blob will increase the performance much, since you will still need to deserialise the blob into an entities later on in your application code.
If you never need all the fields of the object, then one method of increasing performance is to use projection queries. (See https://developers.google.com/appengine/docs/java/datastore/projectionqueries)
Projection queries basically allow you to return back only the properties you require. This works because it uses the information stored in the indexes, hence it never needs to deserialise the entity. This means that you must have an index defined for any projection query you use.

How to define entity property not possible to describe by associations

I am writing a vocabulary learning application.
I have a Wordset Entity.
I want it to contain a property - WordsToLearn (a collection of words to learn for a CURRENT user, words which are either new, i.e no repetitions for current user or have Repetition due today or earlier)
How can I implement this?
Without this my object seems very incomplete.
Are entities limited to simple relationships and I should forget about it in this place and move it to Wordset Repository.
I would be very useful to be able to get that information (wordsToLearn) from Wordset Object
Yes, entities are limited to these simple relationships. For more complex queries you have to use a WordsetRepository that you can pass your current_user object to and use that to get the desired entities in your controllers. You can use Doctrine's DQL to fetch 'real' entity objects instead of just SQL results.

Should I use DTOs or domain objects in flex when connecting to blazeds/java/jpa using remote-services?

I am working on an Adobe Flex front-end with a Java back-end using JPA for persistence. The protocol I am using is remote objects (AMF) implemented with BlazeDS.
I started out with a service-facade and entity DAOs, but without any specific DTOs. The same POJOs, the domain objects, were passed in the service-facade as those used as DTOs passed to the Hibernate DAOs.
However, the latest few days I have been thinking whether this is a good approach or not. The latest article on the subject I read was this one:
Interesting JPA Pattern blog
The situation:
Say I have POJO Book with a unidirection ManyToOne relation with the POJO Category (i.e. each book may only be associated with one category, but the same category may be associated with many books). I see some alternatives:
Alternative 1:
I expose a method/operation addUpdateBook(Book book). In the implementation of this operation I add/update both the book and the referenced category. I mean, if the client submits a book having a category that doesn't exist from before, this would mean that the client implicitly may edit categories using the addUpdateBook operation.
the client is working directly with the domain model!
the entire category information will be sent when a new book is added
even though a reference to the category would be sufficient
Alternative 2:
I expose a method/operation addUpdateBook(Book book,Long categoryId). In the implementation I retrieve the category for the given categoryId and replace the category given in the book POJO and then I persist the book. In other words, I ignore any category in the book object, I just look at the categoryId. This means that the client would need to use another operation in order to modify the category.
pro: the client can still work more or less on the domain model, but ...
con: ... it is confusing for the client that the category of the book
object will be ignored
con: the entire category information of the book will be sent, even
if the server never will read it
pro: it may be more clear when a separate operation should be used
for category modifications
con: I need to retrieve the category before persisting the book. I
guess this means some overhead.
Alternative 3:
I expose a method/operation addUpdateBook(BookDTO bookDto). The POJO BookDTO looks as the POJO Book, but instead of a field Category category it has a field Long categoryId. In the implementation I retrieve the Category for the given categoryId before I persist the Book.
pro: not confusing for the client
con(?): what should the method getBook(Long bookId) return? should it
return only the BookDTO? Then it would be required to invoke also the
operation getCategory(Long categoryId) in order to have "the entire
book information". Then the client would need to set together the
different parts to a local domain representation of the book.
Compared to alternative 1 this would be more complex on the client
side?
con: I need to retrieve the category before persisting the book. I
guess this means some overhead.
con: being forced to use the DTOs in the client makes it deal with physical details thereby and makes it somewhat distant from the actual domain model. It seems like I am missing the point with having an domain model and using JPA in the business layer.
I guess (!) alternative 3 is the way you would design the operations in a SOA context. However, for me, it is not that important to be loosely-coupled between the client and server. My focus is not to provide multiple client-platform support.
Which alternative would you propose? Are there other better alternatives? Do you know any nice resources, such as code examples, which could help me?
I'm using something related to "Alternative 3". In the beginning I've started to use domain objects (probably also because of my experience with dataservices), after a while I found too many problems and I've switched to DTO's. All the publing services are exposing only DTO's (both for input/output parameters).
Some of the problems that I've met during working directly with the domain objects and BlazeDS:
a)you need to break the domain objects encapsulation (like exposing properties, or exposing private constructors) in order to use them for data transfer. Otherwise you will have to write your own serialization/deserialization.
b)you need to use tricks in order to allow data conversion between client/server. For example, using strings instead of dates in order to prevent timezone difference. Or using strings instead of int/double. You can solve some of these issues by writing custom proxies, but I still think that it's easier to use strings instead of other data types.
c)most of the time you don't need all the data from the domain objects, and in order to deal with that you need to use various frameworks with support for data pagination/lazy instantiation on the client. This frameworks are introducing complexity, and I try to stay away from that.
The main disadvantage of using DTO is the amount of boiler code in order to do the conversion between the domain objects-DTOs...but I still prefer using them.

Which pattern most closely matches scenario detailed and is it good practice?

I have seen a particular pattern a few times over the last few years. Please let me describe it.
In the UI, each new record (e.g., new customers details) is stored on the form without saving to database. This clearly has been done so not clutter the database or cause unnecessary database hits.
While in the UI state, these objects are identified using a Guid. When these are a saved to the database, their associated Guids are not stored. Instead, they are assigned a database Int as their primary key.
The form can cope with a mixure of retrieved items from the database (using Int) as well as those that have not yet been committed (using Guid).
When inspecting the form (using Firebug) to see which key was used, we found a two part delimited combined key had been used. The first part is a guid (an empty guid if drawn from the database) and the second part is the integer (zero is stored if it is not drawn from the database). As one part of the combined key will always uniquely identify a record, it works rather well.
Is this Good practice or not? Can anyone tell me the pattern name or suggest one if it is not already named?
There are a couple patterns at play here.
Identity Field Pattern
Defined in P of EAA as "Saves a database ID field in an object to maintain identity between an in-memory object and a database row." This part is obvious.
Transaction Script and Metadata Mapping
In general, the ASP.NET DataBound controls use something like an Transaction Script pattern in conjunction with a Metadata Mapping pattern. Fowler defines Metadata Mapping as "holding details of object-relational mapping in metadata". If you have ever written a data source control, the Metadata Mapping aspect of this pattern seems obvious.
The Transaction Script pattern "organizes business logic by procedures where each procedure handles a single request from the presentation." In order to encapsulate the logic of maintaining both presentation state and data-state it is necessary for the intermediary object to indicate:
If a database record exists
How to identify the backend data record, to populate the UI control
How to identify the data and the UI control if there is no current data record, so that presentation data can be updated from the backend datastore.
The presence of the new client data entry Guid and the data-record integer Id provide adequate information to determine all of this with only a single call to the database. This could be accomplished by just using integers (and perhaps giving a unique negative integer for each unpersisted UI data item), but it is probably more explicit to have two separate fields.
Good or Bad Practice?
It depends. ASP.NET is a pretty successful software project, and this pattern seems to work consistently. However, this type of ASP.NET web control has a very specific scope of application - to encapsulate interaction between a UI and a database about data objects with simple mappings. The concerns do seem a little blurred, but for many applicable scenarios this will still be acceptable. The pattern is valid whereever a Row Data Gateway would be acceptable. If there is more than one database row affected by a web control, then this approach will not be functional. In these more complex cases, either an Active Record implementation or the combination of a Domain Model and a Repository implementation would be better suited.
Whether a pattern is good or bad practice really depends on the scenario in which it is being applied. It seems like people tend to advocate more complex design structures, because they can be applied to more scenarios without failing. However, in a very simple application where the mappings between data records and the UI are direct, this pattern is very useful because it creates the intended result while minimizing the amount of performance and development overhead.
I don't think there is a specific pattern for that.
Is it good practice? I don't think so. First, it's not very object oriented. How about:
interface ICommittable
{
/// <summary>
/// Gets or sets a value indicating whether the entity was already committed to the database.
/// </summary>
bool IsCommitted;
/// <summary>
/// Gets or sets the ID of the entity, used either in database or generated by UI or an underlying BL.
/// </summary>
Guid Id;
}
Instead, what they do is to mix three separate data entries in one in a non obvious way:
The ID
Another ID (why?)
A fact that the entity was committed or not.
Especially, having two separate IDs is extremely confusing and will require not only a good documentation, but some time for a new developer to understand what's happening here.
If the purpose was to create new entities without querying the database for a new ID, they could use GUIDs everywhere: when a new entity is created, you Guid.CreateNew it's ID, then, if need, you commit everything, this GUID being the identifier in the database too (there are few chances to have a collision between already saved GUIDs and a new one, so I wouldn't care about that).
Much more simple, isn't it?
It's also not easy to do a few things. For example, how do you compare two entities? Remember that:
Two committed entities which have different GUIDs are not equal,
Two not committed entities which have different IDs are not equal,
A committed entity may be equal to a non committed entity, even if their GUIDs and their IDs will be different.
To conclude, it seems like a lack of refactoring. Probably they were modifying a project where entities were already identified in the database by their id (int) unique key, so instead of refactoring this, they just added GUIDs, thus making the overall thing:
More difficult to understand,
Very difficult to work with and to modify in future.
If I'm not wrong it's the repository pattern: http://martinfowler.com/eaaCatalog/repository.html
This is well described in the Evans Domain Driven Design book and has proven to work well under specific circumstances.

Resources