The same question is like in the topic.
When using AWS documentation, they recommend that you keep your related data in one table. But reading the various comments from other users, for the most part, I saw not to use nested objects (list of maps). So what is it really supposed to look like?
When we use maps/lists, we are often modeling one-to-many relationships. For example, here's an example where I model the relationship between Users and their Hobbies.
This strategy works fine to model one-to-many relationships if you don't have any access patterns around searching users by hobby. For example, if your access pattern includes "fetch all users who like photography", this is not a good way to store your data. Instead, you may consider storing hobbies like this
This data model allows you to fetch users by hobby, something you couldn't do in the prior example.
There is no single way to model data in DynamoDB. Instead, there are common strategies and patterns that model various relationships (one-to-many, many-to-many, etc). The best strategy for any particular situation will depend on the needs of your application.
What is software physical specification and logical specification? I understand about logical specifications which could be derived from user requirements like identifying attributes, entities and use-cases and draw the software using UML in graphical depiction. But what is the physical specification of software?
Logical vs physical terminology
The terminology logical vs. physical specification is related to the idea of an implementation-independent specification (logical) that is then refined to take into account implementation details and related constraints (physical).
This distinction can be made for any system view-point, such as architecture, data-flows and process design. But the terms are mainly used in the context of data modeling (ERD):
the logical specification describes how data meets the business requirements. Typically, you'd describe entities, their attributes and their relationships;
the physical specification describes how a logical data model is implemented in the database, taking into consideration also technical requirements and constraints. Typically, you'd find tables, columns, primary keys, foreign keys, indexes and everything that matters for the implementation.
Remark: The term "physical" probably dates back to the times where you had to design carefully the layout of the data in data (e.g. in COBOL you had to define the fields of a record at the byte level and that layout was really used to physically store the data on the disk; it was also very difficult to change it afterwards).
Purpose oriented terminology
Nowadays, specifications or models tend to be named according to their purpose. But how they are called and whether they are independent models or successive refinements of the same model is very dependent on the methodology. Some popular terminology:
Requirement specification / Analysis model, to express the business needs (i.e. problem space)
Design specification / model, to describe the solution (i.e. solution space)
Implementation specification / model, with all the technical details (i.e. one-to-one with the code, and therefore difficult to keep in sync).
Domain model, to express the design of business objects and business logic in a given domain, but without any application-specific design (i.e. like design model but with only elements that are of interest for the business).
UML
UML is UML and the same kind of diagrams may be used for different purposes. For example:
A use-case diagram represents in general user goal and tend to be mapped to requirements ("logical"). But use-cases can also show the relationship of an autonomous device / independent component to technical actors in its environment ("physical").
A class diagram can be used to document a domain model ("logical"). But a class diagram can also document the implementation details ("physical"). See for example this article with an example of logical vs. physical class diagram.
In my application, I've a domain model which is essentially a graph. I need to essentially perform the following operations and the send the resulting graph to the client over network
Operations to be performed
Filter certain nodes based on business policy
Augment with more nodes and relationships (potentially from other data providers
After filtering, I need a serialization mechanism as well. After working with Neo4j and Tinkerpop, I feel Tinkerpop fits well for my usecase as it has
In-memory graph support (TinkerGraph)
Serialization mechanisms: GraphML, GML and GrapjSON
I am wondering if my understanding is accurate and approach is correct. Please suggest.
Sounds right. I often extract subgraphs and store them in a TinkerGraph for follow-on processing. I also use GraphSON for serialization. Seems like you're on the right track.
Here are 2 good sources for additional information:
gremlindocs.com
https://groups.google.com/forum/#!forum/gremlin-users
When I have to create CRUD matrix? After conceptual model have been created, or after normalization or when functional requirements are specified?
Good question. Depends on what you need the CRUD matrix for. Typically it was used originally to divide a larger administrative system into smaller sub-systems, each of which the functions and entities (tables) have a strong relationship. Of course, there are no hard boundaries because there will be some data shared or you should create totally separate systems. Nowadays I create CRUD-s when I need them: can be for a new system when developing the data model, but can also be for maintenance to make an estimate what parts by a data model change will be influenced. So to answer your question: it depends when you it. When you need it, update or create it.
A handy way to do it is to create a graph (mathematical) in graphml format (XML). Then load that into yEd and ask it to lay out the graph and create groupings. For a small application, a sample is available as PDF and Graphml
i read in one article that its not a good practice to pass dataset between different layers of .net web application.(DAL->BAL->Pages vice versa).Is that correct?
please give your suggestions.
Thanks
SNA
On the one hand, the problem with datasets and datatables is that they expose database implementation details like column names and types outside of your data access layer. Change a column name in your database or query and odds are that change is propogated to your dataset as well, forcing a re-compile of any tier that uses the dataset. So if you retrieve data into a dataset you should convert it to use strongly-typed business objects before passing it on.
On the other hand, a dataset doesn't care what kind of database it belongs to. You can use them with access, oracle, sql server, mysql, anything. So there is some generic-ness there that can make them useful when passing data between tiers. And just like the business layer shouldn't care about database details the data layer shouldn't really need to know what the the business objects are, so there's a good argument that you should use them for data interchange at that level.
My normal procedure is to have a sort of one-way "translation" tier between the business and data access layers, so that the business layer only deals with business objects and the data layer only returns generic data. This currently takes one of two forms:
I'll write my data access methods to return datatables or datareaders, the the translation tier will use a factory pattern to convert those rows into the desired strongly-typed business objects.
or
I'll use C# iterator blocks to convert a datareader into an IEnumerable<IDataRecord> in the data access layer and the translation tier will use them to change that IEnumerable<IDataRecor> into an IEnumerable<MyBusinessObject>, such that the code only ever iterates over the result set one time.
There is nothing wrong with passing around datasets but it's not a great practice.
Pros:
Easy to pass around and use in .NET apps
No having to code wrapper classes
Lots of functionality built into DataSets
Cons:
Data type that is not really type safe.
Your data field names can change all parts of your app will compile fine until they blow up at runtime.
Heavy object. Dataset does a ton of stuff and you probably don't need 90% of it.
Having non .NET apps talk to your DAL or BAL is going to be very clean.
There's nothing wrong about passing DataSets from your DAL to your BAL.
I think this stackoverflow question on DAL best practices sums up the two schools of thought pretty well.
I am in the middle of a "discussion"
with a colleague about the best way to
implement the data layer in a new
application.
One viewpoint is that the data layer
should be aware of business objects
(our own classes that represent an
entity), and be able to work with that
object natively.
The opposing viewpoint is that the
data layer should be object-agnostic,
and purely handle simple data types
(strings, bools, dates, etc.)
There is no problem with passing dataset across layers. If you observe, you will notice that passing dataset is by reference and not by value.So there is no issue of performance here.
Now what you read is also right, but you have to understand the context. If you are passing the dataset across remote boundaries, that is not a recommended practice.
There's nothing fundamentally wrong with that doing that. Although the basic idea of having a DAL, BLL and UI layer is so that each layer can abstract what's beneath it. E.g. the BLL shouldn't have any knowledge of how the database is structured because the DAL abstracts that away. If a dataset is being loaded in the DAL then passed straight through the BLL to the pages, it kind of sounds like the BLL is pointless.
The strongest statements often seen about DataSet is not to pass it into or out of a web service. That goes beyond exposing implementation details, and includes exposing details of the platform (.NET).
Although it's possible to change "table" and "column" names in a DataSet from those in the underlying database, you're still largely stuck with the underlying structure of the database. To abstract that, I would use Entity Framework. It allows you, for instance, to define a "Customer" entity which takes data from multiple tables and puts it into a single entity. Code using the entity doesn't need to know whether it is implemented as one table, two tables, or whatever.
Even there, you should not pass these entities outside of a web service boundary. They still pass implementation details outside of the implementation. For instance, properties of the base classes get serialized, even though these are just implementation details.
As far as I've understood, the DataSet requires the db connection to be open, for as long as it is used, which will reduce performance in your application as it keeps the connection open until the content is rendered.
Instead, I recommend using generic collections, such as IEnumerable<myType> or IQueryable<myType>, where myType is a custom type which you fill with your data.