I am using JMH. I have 2 methods which I want to benchmark separately. I have one method (method A) that creates an array of objects. I want another method (method B) to use the same array of objects created in method A. Problem is that by the time method A finished running, the array content no longer exists outside the method A scope. How do I deal with shared state between methods?
You should create a #Setup block which creates a collection of instances that you can use for serialization/deserialization. This #Setup block will have to do two things; create the objects that you want to serialize, and the serialized versions of the same.
What you then do is write your test methodA that performs serialization across all the objects (and compares them against the known good serialized fields) and then have the second test methodB that performs the deserialization of the objects and compares them with the known good values.
In essence you shouldn't have setup code in your test methods, and you shouldn't assume any kind of ordering between them. Have the setup code that you do once in #Setup and then only read those values afterwards. Make sure you're returning the values or checking them in some way so that they aren't eliminated by the JIT.
Related
Does dart handle the case in which two different calls of an asynchronous function try to add two (or more) objects to a List at the same time? If it does not is there a way for me to handle this?
I do not need those two new objects to be inserted in a particular order because I take care of that later on, I only wandered what happens in that unlikely but still possible case
If you're wondering if there's any kind of locking necessary to prevent race conditions in the List data structure itself, no. As pskink noted in a comment, each Dart isolate runs in its own thread, and as the "isolate" name implies, memory is not shared. Two operations therefore cannot both be actively updating a List at the same time. Once all asynchronous operations complete, your List will contain all of the added items but not with any guaranteed ordering.
If you need to prevent asynchronous operations from being interleaved, you could use package:pool.
I have a requirement where I only do point lookups but I also need to iterate but don't have to be in any specific order. I used OptimizeForPointLookup and used the iterator API and everything seems to work fine. However, the rocksdb code is documented with the following in options.h against the OptimizeForPointLookup api.
// Use this if you don't need to keep the data sorted, i.e. you'll never use
// an iterator, only Put() and Get() API calls
Is there something I am missing? Interestingly the iteration also seems to be happening in a sorted order.
OptimizeForPointLookup() API makes the GET/PUT operation faster by creating a BLOOM FILTER and setting the Index type to kHashSearch. As the name suggest, kHashSearch creates hash over the keys and makes point lookups faster.
For normal iterator operation, the index type is set to kBinarySearch.
RocksDB by default, inserts data into memtable in sorted order. Optimizing for Point Lookups doesnot affect this insert behaviour of rocksDB.
I am learning Java 8 newly , i see one definition related to functional programming which is "A program created using only pure functions , No Side effects allowed".
One of side effects is "Modifying a data structure in place".
i don't understand this line because at last some where we need to speak with database for storing or retrieving or updating the data.
modifying database is not functional means how we will speak with database in functional programming ?
"Modifying a data structure structure in place" means you directly manipulate the input datastructure (i.e. a List). "Pure functions" mean
the result is only a function of it's input and not some other hidden state
the function can be applied multiple times on the same input producing the same result. It will not change the input.
In Object Oriented Programming, you define behaviour of objects. Behaviour could be to provide read access to the state of the object, write access to it, or both. When combining operations of different concerns, you could introduce side effects.
For example a Stack and it's pop() operation. It will produce different results for every call because it changes the state of the stack.
In functional programming, you apply functions to immutable values. Functions represent a flow of data, not a change in state. So functions itself are stateless. And the result of a function is either the original input or a different value than the input, but never a modified input.
OO also knows functions, but those aren't pure in all cases, for example sorting: In non-functional programming you rearrange the elements of a list in the original datastructure ("in-place"). In Java, this is what Collections.sort()` does.
In functional programming, you would apply the sort function on an input value (a List) and thereby produce a new value (a new List) with sorted values. The function itself has no state and the state of the input is not modified.
So to generalize: given the same input value, applying a function to this value produces the same result value
Regarding the database operations. The contents of the database itself represent a state, which is the combination of all its stored values, tables etc (a "snapshot"). Of course you could apply a function to this data producing new data. Typically you store results of operations back to the db, thus changing the state of the entire system, but that doesn't mean you change the state of the function nor it's input data. Reapplying the function again, doesn't violate the pure-function constraints, because you apply the data to new input data. But looking at the entire system as a "datastructure" would violate the constraint, because the function application changes the state of the "input".
So the entire database system could hardly be considered functional, but of course you could operate on the data in a functional way.
But Java allows you to do both (OO and FP) and even mix both paradigms, so you could choose whatever approach fits your needs best.
or to quote from this answer
If you have several needs intermixed, mix your paradigms. Do not
restrict yourself to only using the lower right corner of your
toolbox.
I have a list of algorithms that I want to run on a dataset. For example, say my dataset is a list of addresses. I need to check the validity of the addresses but I have several different algorithms for validating. Say I have validation_one and validation_two. But in the future I will need to add validation_three, validation_four, etc. I need ALL of the validations to run on the address list, even the new ones when they get added.
Is there a design pattern that fits into this? I know strategy is for selecting an algorithm but I specifically need a way to apply all the algorithms on the dataset.
You have not stated a language.. but assuming it has generics.
Given a DataSet<T>
Assuming also that there is no cross validation required (i.e. each T can be validated entirely by its own data)
Declare a validation Strategy, with a single method.
IValidate<T>{bool validate(T item);}
validation_one, validation_two…. Will implement this strategy
Have a List<IValidate<T>> which you can add and remove implementations to.
Foreach item in the dataset call each strategy in the list.
It’s then your choice to how you deal with failures.
This sounds like the Chain of Responsibility where all handlers in the chain accept the request and pass it to the next handler.
Composite.
Which is basically what the two other answers equate to.
The first answer is an implementation of COmposite. The second one is usign Chain of Rersponsibility as a Composite.
If I have to choose between static method and creating an instance and use instance method, I will choose static methods always. but what is the detailed overhead of creating an instance?
for example I saw a DAL which can be done with static classes but they choose to make it instance now in the BLL at every single call they call something like.
new Customer().GetData();
how far this can be bad?
Thanks
The performance penalty should be negligible. In this blog entry someone did a short benchmark, with the result that creating 500,000 objects and adding them to a list cost about 1.5 seconds.
So, since I guess new Customer().GetData(); will be called at most a few hundred times in a single BLL function, the performance penalty can be ignored.
As a side note, either the design or the naming of the class is broken if new Customer().GetData(); is actually used: If class Customer just provides the means to get the data, it should be called something different, like CustomerReader (for lack of a better name). On the other hand, if Customer has an instance state that actually represents a Customer, GetData should be static -- not for reasons of performance, but for reasons of consistency.
Normally one shouldn't be too concerned about object creation overhead in the CLR. The allocation of memory for the new object would be really quick (due to the memory compacting stage of the garbage collector - GC). Creating new objects will take up a bit of memory for the object and put a bit more pressure on the GC (as it will have to clean up the object), but if it's only being used for a short time then it might be collected in an early GC generation which isn't that bad performance wise. Also the performance overhead would be dwarfed by the call to a database.
In general I'll make the decision whether to create a new object for some related methods or just utilize a static class (and methods) based on what I require from the object (such as need to mock/stub it out for tests) and not the slight difference in performance
As a side note - whether new Customer().GetData(); is the right place to put such code is questionable - to me it seems like the Data returned is directly related to a customer instance based on that statement and not actually a call to the database to retrieve data.