I'm new to reactive programming and I'm using reactor through micronaut framework and kotlin. I'm trying to understand the advantages of reactive programming and how we implement it using Map and FlatMap through Mono and Flux.
I understand the non-blocking aspect of reactive programming but I'm confused if the operation on the data stream is actually asynchronous.
I've been reading about FlatMap and understand that they asynchronously produce inner streams and then merge these streams to another Flux without maintaining the order. The many diagrams I've seen all make it easier to understand but I have some basic questions when it comes down to actual use-cases.
Example:
fun updateDetials() {
itemDetailsCrudRepository.getItems()
.flatMap {
customerRepository.save(someTransferObject.toEntity(it))
}
}
In the above example assume itemDetailsCrudRepository.getItems() returns a Flux of a particular Entity. The flatMap operation has to save each of the items in the flux to another table. customerRepository.save() will save the item from the flux and we get the required entity through an instance of a data class someTransferObject.
Now, let's say the getItems() query returned 10 items and we need to save 10 rows in the new table. Is the flatMap operation(the operation of saving these items into the new table) applied to each item of the flux one at a time(synchronously) or does all the save happen at once asynchronously?
One thing I read was if subscribeOn(Scheduler.parallel()) is not applied then the flatMap operation is applied to each item in the flux one at a time(synchronously). Is this information right?
Please do correct me if my basic knowledge itself is incorrect.
Related
Does dart handle the case in which two different calls of an asynchronous function try to add two (or more) objects to a List at the same time? If it does not is there a way for me to handle this?
I do not need those two new objects to be inserted in a particular order because I take care of that later on, I only wandered what happens in that unlikely but still possible case
If you're wondering if there's any kind of locking necessary to prevent race conditions in the List data structure itself, no. As pskink noted in a comment, each Dart isolate runs in its own thread, and as the "isolate" name implies, memory is not shared. Two operations therefore cannot both be actively updating a List at the same time. Once all asynchronous operations complete, your List will contain all of the added items but not with any guaranteed ordering.
If you need to prevent asynchronous operations from being interleaved, you could use package:pool.
There is Stream, which can be retrieved from Collection, in Java 8, that is public default stream<E> Collection.stream(). So I would like to express the relationship between Stream and Collection with UML for practice.
I think the proper relationship is dependency. But I am not sure about it. So I hereto would like to know what is the proper relationship between Collection and Stream in the aspect of UML? If so what is the tenet of dependency?
The Collection of E is an aggregate, and it provides a method stream() which returns a Stream of E, which uses the collection as source.
So the relationship is rather complicated: there is a <<create>> dependency from Collection to Stream. But at the same time, there is a potentially navigable association from the Stream to the Collection, although this is not visible for the outside world. By the way, you could represent both with a templateable element.
So you could have something like this in theory:
Note however, that in practice you should omit the association between the stream and the collection, because it is not usable for the outside world. This makes only sense if you're interested in the inner design of the Java class library. You'd better put a comment on the <<create>> constraint explaining in plain language that the one serves as source for the other.
I am learning Java 8 newly , i see one definition related to functional programming which is "A program created using only pure functions , No Side effects allowed".
One of side effects is "Modifying a data structure in place".
i don't understand this line because at last some where we need to speak with database for storing or retrieving or updating the data.
modifying database is not functional means how we will speak with database in functional programming ?
"Modifying a data structure structure in place" means you directly manipulate the input datastructure (i.e. a List). "Pure functions" mean
the result is only a function of it's input and not some other hidden state
the function can be applied multiple times on the same input producing the same result. It will not change the input.
In Object Oriented Programming, you define behaviour of objects. Behaviour could be to provide read access to the state of the object, write access to it, or both. When combining operations of different concerns, you could introduce side effects.
For example a Stack and it's pop() operation. It will produce different results for every call because it changes the state of the stack.
In functional programming, you apply functions to immutable values. Functions represent a flow of data, not a change in state. So functions itself are stateless. And the result of a function is either the original input or a different value than the input, but never a modified input.
OO also knows functions, but those aren't pure in all cases, for example sorting: In non-functional programming you rearrange the elements of a list in the original datastructure ("in-place"). In Java, this is what Collections.sort()` does.
In functional programming, you would apply the sort function on an input value (a List) and thereby produce a new value (a new List) with sorted values. The function itself has no state and the state of the input is not modified.
So to generalize: given the same input value, applying a function to this value produces the same result value
Regarding the database operations. The contents of the database itself represent a state, which is the combination of all its stored values, tables etc (a "snapshot"). Of course you could apply a function to this data producing new data. Typically you store results of operations back to the db, thus changing the state of the entire system, but that doesn't mean you change the state of the function nor it's input data. Reapplying the function again, doesn't violate the pure-function constraints, because you apply the data to new input data. But looking at the entire system as a "datastructure" would violate the constraint, because the function application changes the state of the "input".
So the entire database system could hardly be considered functional, but of course you could operate on the data in a functional way.
But Java allows you to do both (OO and FP) and even mix both paradigms, so you could choose whatever approach fits your needs best.
or to quote from this answer
If you have several needs intermixed, mix your paradigms. Do not
restrict yourself to only using the lower right corner of your
toolbox.
If I have to choose between static method and creating an instance and use instance method, I will choose static methods always. but what is the detailed overhead of creating an instance?
for example I saw a DAL which can be done with static classes but they choose to make it instance now in the BLL at every single call they call something like.
new Customer().GetData();
how far this can be bad?
Thanks
The performance penalty should be negligible. In this blog entry someone did a short benchmark, with the result that creating 500,000 objects and adding them to a list cost about 1.5 seconds.
So, since I guess new Customer().GetData(); will be called at most a few hundred times in a single BLL function, the performance penalty can be ignored.
As a side note, either the design or the naming of the class is broken if new Customer().GetData(); is actually used: If class Customer just provides the means to get the data, it should be called something different, like CustomerReader (for lack of a better name). On the other hand, if Customer has an instance state that actually represents a Customer, GetData should be static -- not for reasons of performance, but for reasons of consistency.
Normally one shouldn't be too concerned about object creation overhead in the CLR. The allocation of memory for the new object would be really quick (due to the memory compacting stage of the garbage collector - GC). Creating new objects will take up a bit of memory for the object and put a bit more pressure on the GC (as it will have to clean up the object), but if it's only being used for a short time then it might be collected in an early GC generation which isn't that bad performance wise. Also the performance overhead would be dwarfed by the call to a database.
In general I'll make the decision whether to create a new object for some related methods or just utilize a static class (and methods) based on what I require from the object (such as need to mock/stub it out for tests) and not the slight difference in performance
As a side note - whether new Customer().GetData(); is the right place to put such code is questionable - to me it seems like the Data returned is directly related to a customer instance based on that statement and not actually a call to the database to retrieve data.
I've got an ASP.NET page that has a bunch of controls that need to be populated (e.g. dropdown lists).
I'd like to make a single trip to the db and bring back multiple recordsets instead of making a round-trip for each control.
I could bring back multiple tables in a DataSet, or I could bring back a DataReader and use '.NextResult' to put each result set into a custom business class.
Will I likely see a big enough performance advantage using the DataReader approach, or should I just use the DataSet approach?
Any examples of how you usually handle this would be appreciated.
If you have more than 1000 record to bring from your DataBase.
If you are not very interested with
custom storing and custom paging
"For GridView"
If your server have a memory stress.
If there is no problem to connect to
your DataBase every time that page
called.
Then i think the better is to use DataReader.
else
If you have less than 1000 record to bring from your DataBase.
If you are interested with
storing and paging "For
GridView"
If your server haven't a memory
stress.
If you want to connect to your
DataBase just one time and get the
benefits of Caching.
Then i think the better is to use DataSet.
I hop that i'm right.
Always put your data into classes defined for the specific usage. Don't pass DataSets or DataReaders around.
If your stored proc returns multiple sets, use the DataReader.NextResult to advance to the next chunk of data. This way you can get all your data, load it to your objects, and close the reader as soon as possible. This will be the fastest method to get your data.
Seeing that no answer has been marked yet even though there are plently of good answers already, I thought I'd add by two bits as well.
I'd use DataReaders, as they are quite a bit faster (if performance is your thing or you need as much as you can get). Most projects I've worked on have millions of records in each of the tables and performance is a concern.
Some people have said it's not a good idea to send DataReader across layers. I personally don't see this as a problem since a "DbDataReader" is not technically tied (or doesn't have to be) to a database. That is you can create an instance of a DbDataReader without the need for a database.
Why I do it is for the following reasons:
Frequently (in a Web Application) you are generating either Html, or Xml or JSON or some other transformation of your data. So why go from a DaraReader to some POCO object only to transform it back to XML or JSON and send it down the wire. This kind of process typically requires 3 transformations and a boot load of object instantiations only to throw them away almost instantly.
In some situations that's fine or can't be helped. My Data layer typically surfaces two methods for each stored procedure I have in the system. One returns a DbDataReader and the other returns a DataSet/DataTable. The methods that return a DataSet/DataTable call the method returning a DbDataReader and then uses either the "Load" method of the DataTable or an adapter to fill the dataset. Sometimes you need DataSets because you probably have to repurpose the data in some way or you need to fire another query and unnless you have MARS enabled you can't have a DbDataReader open and fire another query.
Now there are some issues with using DbDataReader or DataSet/DataTable and that is typically code clarity, compile time checking etc. You can use wrapper classes for your datareader and in fact you can use your DataReaders a IEnumerable with them. Really cool capability. So not only do you get strong typing and code readability you also get IEnumerable!
So a class might look like this.
public sealed class BlogItemDrw : BaseDbDataReaderWrapper
{
public Int64 ItemId { get { return (Int64)DbDataReader[0]; } }
public Int64 MemberId { get { return (Int64)DbDataReader[1]; } }
public String ItemTitle { get { return (String)DbDataReader[2]; } }
public String ItemDesc { get { if (DbDataReader[3] != DBNull.Value) return (String)DbDataReader[3]; else return default(String); } }
public DateTime ItemPubdate { get { return (DateTime)DbDataReader[4]; } }
public Int32 ItemCommentCnt { get { return (Int32)DbDataReader[5]; } }
public Boolean ItemAllowComment { get { return (Boolean)DbDataReader[6]; } }
public BlogItemDrw()
:base()
{
}
public BlogItemDrw(DbDataReader dbDataReader)
:base(dbDataReader)
{
}
}
DataReader Wrapper
I have a blog post (link above) that goes into a lot more detail and I'll be making a source code generator for these and other DataAccess layer code.
You can use the same technique for DataTables (the code generator produces the code) so you can treat them as strongly typed DataTable without the overhead of what VS.NET provides out of the box.
Keep in mind that there is only one instance of the wrapper class. So you're not creating hundreds of instances of a class only to throw it away.
Map the DataReader to intermediate objects and then bind your controls using those objects. It can be ok to use DataSets in certain circumstances, but those are few and far between when you have strong reasons for "just getting data". Whatever you do, don't pass a DataReader to your controls to bind off of (not that you said that you were considering that).
My personal preference would be to use an ORM, but if you are going to hand roll your data access, by all means I think you should prefer mapping DataReaders to objects over using DataSets. Using the .NextResult as a way to limit yourself from hitting the database multiple times is a double edged sword however so choose wisely. You will find yourself repeating yourself if you try to create procs that always grab exactly what you need using only one call to the database. If your application is only a few pages, it is probably OK, but things can get out of control quickly. Personally I'd rather have one proc per object type and then hit the database multiple times (once for each object type) in order to maximize maintainability. This is where an ORM shines because a good one will generate Sql that will get you exactly what you want with one call in most cases.
If you are not interested in updating or deleting the records you fetched from database, I would suggest using DataReader. Basically DataSet internally uses multiple Datareaders, so DataReader should give you good performance advantage.
In almost every situation DataReaders are the best solution for reading from a database. DataReaders are faster and require less memory than DataTables or DataSets.
Also, DataSets can often lead to situations in which the OO model is broken. It's not very object oriented to be passing around relational data/schemata instead of objects that know how to manipulate that data.
So, for extensibility, scalability, modularity, and performance reasons, always use DataReaders if you consider yourself a Real Programmerâ„¢ ;)
Check the links for facts and discussion about the two in practice and theory.
Irrespective of whether you're fetching a single result-set or multiple result-sets, the consensus seems to be to use a DataReader instead of a DataSet.
In regards to whether you should ever bother with multiple result-sets, the wisdom is that you shouldn't, but I can conceive of a reasonable class of exceptions to that rule: (tightly) related result-sets. You certainly don't want to add a query to return the same set of choices for a drop-down list that's repeated on hundreds or even dozens of pages in your app, but several sets narrowly-used may be sensibly combined. As an example, I'm currently creating a page to display several sets of 'discrepancies' for a batch of ETL data. None of those queries are likely to be used elsewhere, so it would be convenient to encapsulate them as a single 'get discrepancies' sproc. On the other hand, the better performance of doing so may be insignificant compared to working-around the natural one-sproc-one-result-set architecture of your ORM or hand-rolled data-access code.
I have gone to a method that uses DataReaders for all calls, I have noticed a marked performance impovement, especially in cases when I am loading drop down lists, and other simple items like that.
Personally with multiple drop downs, I typically go to pullling individual chunks of data to get it, rather than say a stored procedure that returns 5 result sets.
Take a look into the TableAdapters that are available with .NET 2.0 and up. What they do is give you the strength of a strongly-typed DataTable and allow you to map a Fill method to it that will use a DataReader to load it up. Your fill method can be existing stored procedures, your own AdHoc SQL, or even let the wizard generate the AdHod or Stored Procedure for you.
You can find this by starting up a new XSD DataSet object within your project. For tables that are used for more than just lookup, you can also map insert/update/delete methods to the TableAdapter.