Implementing IStoreSagaData which arcives - rebus

I'm finding that IStoreSagaData.Delete is not called on when a saga calls MarkAsComplete during the first message. Is this by design? This makes it impossible to keep an archive of sagas which have completed.

Yes, that is true - it's a consequence of the code being so "intelligent" that it knows not to do anything about a saga that is immediately marked as complete.
This also means that it would not be easy to find a way to hook in that functionality yourself - at least in Rebus versions <= 0.84.0.
Rebus versions >= 0.90.0 (also known as "Rebus 2") is much easier to extend in every way, and I've created this issue because I think a good saga state auditing function would be an awesome feature in Rebus.
I might get around to add it one of the following days.

Related

Trigger a function after 5 API calls return (in a distributed context)

My girlfriend was asked the below question in an interview:
We trigger 5 independent APIs simultaneously. Once they have all completed, we want to trigger a function. How will you design a system to do this?
My girlfriend replied she will use a flag variable, but the interviewer was evidently not happy with it.
So, is there a good way in which this could be handled (in a distributed context)? Note that each of the 5 API calls are made by different servers and the function to be triggered is on a 6th server.
The other answers suggesting Promises seem to assume all these requests necessarily come from the same client. If the context here is distributed systems, as you said it is, then I don't think those are valid answers. If they were, then the interview question would have nothing to do with distributed systems, except to essay your girlfriend's ability to recognize something that isn't really a distributed systems problem.
And the question does have the shape of some classic problems in distributed systems. It sounds a lot like YouTube view counting: How do you achieve qualities like atomicity and consistency in a multi-threaded, multi-process, or multi-client environment? Failing to recognize this, thinking the answer could be as simple as "a flag", betrayed a lack of experience in distributed systems.
Another thing about that answer is that it leaves many ambiguities. Where does the flag live? As a variable in another (Java?) API? In a database? In a file? Even in a non-distributed context, these are important questions. And if she had gone on to address these questions, even being innocent of all the distributed systems complications, she might have happily fallen into a discussion of the kinds of D.S. problems that occur when you use, say, a file; and how using a ACID-compliant database might solve those problems, and what the tradeoffs might be there... And she might have corrected herself and said "counter" instead of "flag"!
If I were asked this, my first thought would be to use promises/futures. The idea behind them is that you can execute time-consuming operations asynchronously and they will somehow notify you when they've completed, either successfully or unsuccessfully, typically by calling a callback function. So the first step is to spawn five asynchronous tasks and get five promises.
Then I would join the five promises together, creating a unified promise that represents the five separate tasks. In JavaScript I might call Promise.all(); in Java I would use CompletableFuture.allOf().
I would want to make sure to handle both success and failure. The combined promise should succeed if all of the API calls succeed and fail if any of them fail. If any fail there should be appropriate error handling/reporting. What happens if multiple calls fail? How would a mix of successes and failures be reported? These would be design points to mention, though not necessarily solve during the interview.
Promises and futures typically have modular layering system that would allow edge cases like timeouts to be handled by chaining handlers together. If done right, timeouts could become just another error condition that would be naturally handled by the error handling already in place.
This solution would not require any state to be shared across threads, so I would not have to worry about mutexes or deadlocks or other thread synchronization problems.
She said she would use a flag variable to keep track of the number of API calls have returned.
One thing that makes great interviewees stand out is their ability to anticipate follow-up questions and explain details before they are asked. The best answers are fully fleshed out. They demonstrate that one has thought through one's answer in detail, and they have minimal handwaving.
When I read the above I have a slew of follow-up questions:
How will she know when each API call has returned? Is she waiting for a function call to return, a callback to be called, an event to be fired, or a promise to complete?
How is she causing all of the API calls to be executed concurrently? Is there multithreading, a fork-join pool, multiprocessing, or asynchronous execution?
Flag variables are booleans. Is she really using a flag, or does she mean a counter?
What is the variable tracking and what code is updating it?
What is monitoring the variable, what condition is it checking, and what's it doing when the condition is reached?
If using multithreading, how is she handling synchronization?
How will she handle edge cases such API calls failing, or timing out?
A flag variable might lead to a workable solution or it might lead nowhere. The only way an interviewer will know which it is is if she thinks about and proactively discusses these various questions. Otherwise, the interviewer will have to pepper her with follow-up questions, and will likely lower their evaluation of her.
When I interview people, my mental grades are something like:
S — Solution works and they addressed all issues without prompting.
A — Solution works, follow-up questions answered satisfactorily.
B — Solution works, explained well, but there's a better solution that more experienced devs would find.
C — What they said is okay, but their depth of knowledge is lacking.
F — Their answer is flat out incorrect, or getting them to explain their answer was like pulling teeth.

Client callbacks with protobuf-net.Grpc

I'm currently working on replacing an old WCF client/server pairing with gRpc, and decided to use protobuf-net.Grpc as we've used protobuf-net extensively elsewhere in our codebase. I'm running into a bit of trouble with one particular portion however.
Part of the original service is a Subscribe method which uses IClientCallback to effectively send an event to the client. Looking at regular gRpc, it seems like this would be possible (though a bit hacky) using a server streaming method and storing the IServerStreamWriter object on the server, writing to it whenever we wanted to "fire an event".
For the life of me, however, I can't quite figure out how to do something similar in protobuf-net.Grpc with the IAsyncEnumerable return type. The closest I can figure is using Task.Wait in a loop and updating some shared collection when I want to "fire" the event, which the loop would then check for and yield return. This doesn't seem like it'd scale well, however, and there isn't really a great way to definitely unsubscribe when a client is no longer listening to events.
Is there some other/better way to do this?
Channel<T>, which can be tweaked via AsAsyncEnumerable() - which then essentially acts as a queue at the producer side, and a sequence at the consumer.

What logic must I cover in Collection.allow and Collection.deny to ensure it's secure?

So just started playing with Meteor and trying to get my head around the security model. It seems there's two ways to modify data.
The Meteor.call way which seems pretty standard - pretty much just a call to the server with its own set of business rules implemented.
Then there is the Collection.allow method which seems much more different to anything I've done before. So it seems that if you put an collection.allow, you're saying that the client can make any write operation to that collection as long as it can get past the validations in its allow function.
That makes me feel uneasy cause it's feels like a lot of freedom and my allow function would need to be pretty long to make sure it's locked down securely enough.
For instance, mongodb has no schema, so you'd have to basically have a rule that defines which fields would be accepted and the format those fields must be in.
Wouldn't you also have to put in the business logic for every type of update that might be made to your system.
So say, I had a SoccerTeam collection. There may be several situations I may need to make a change, like if I'm adding or removing a player, changing the team name, team status has changed etc.
It seems to me that you'd have to put everything into this one massive function. It just sounds like a radical idea, but it seems Meteor.call methods would just be a lot simpler.
Am I thinking about this in the wrong manner (or for the wrong use case?) Does anyone have any example of how they can structure an allow or deny function with a list of what I may need to check in my allow function to make my collection secure?
You are following the same line of reasoning I used in deciding how to handle data mutations when building Edthena. Out of the box, meteor provides you with the tools to make a simple tradeoff:
Do I trust the client and get a more responsive UI (latency compensation)? Or do I require strict control over data validation, but force the client to wait for an update?
I went with the latter, and exclusively used method calls for a few reasons:
I sleep better a night knowing there exists exactly one way to update each of my collections.
I found that some of my updates required side effects that only made sense to execute on the server (e.g. making denormalized updates to other collections).
At present, there isn't a clear benefit to latency compensation for our app. We found the delay for most writes was inconsequential to the user experience.
allow and deny rules are weak tools. They are essentially only good for validating ownership and other simple checks.
At the time when we first released to production (August 2013) this seemed like a radical conclusion. The meteor docs, the API, and the demos highlight the use of client-side writes, so I wasn't entirely sure I had made the right decision. A couple of months later I had my first opportunity to sit down with several of the meteor core devs - this is a summary of their reaction to my design choices:
This seems like a rational approach. Latency compensation is really useful in some contexts like mobile apps, and games, but may not be worth it for all web apps. It also makes for cool demos.
So there you have it. As of this writing, my advice for production apps would be to use client-side updates where you really need the speed, but you shouldn't feel like you are doing something wrong by making heavy use of methods.
As for the future, I'd imagine that post-1.0 we'll start to see things like built-in schema enforcement on both the client and server which will go a long way towards resolving my concerns. I see Collection2 as a significant first step in that direction, but I haven't tried it yet in any meaningful way.
stubs
A logical follow-up question is "Why not use stubs?". I spent some time investigating this but reached the conclusion that method stubbing wasn't useful to our project for the following reasons:
I like to keep my server code on the server. Stubbing requires that I either ship all of my model code to the client or selectively repeat parts of it again. In a large app, I don't see that as practical.
I found the the overhead required to separate out what may or may not run on the client to be a maintenance challenge.
In order for the stub to do anything other than reject a database mutation, you'd need to have an allow rule in place - otherwise you'd end up with a lot of UI flicker (the client allows the write but the server immediately invalidates it). But having an allow rule defeats the whole point, because a user could still write to the db from the console.
The usual allow methods I have are these:
MyCollection.allow({
insert: false
update: false
remove: false
})
And then, I have methods which take care of all insertions. These methods perform the type checks and permission assessment. I have found that to be a much more maintainable method: completely decoupling the data layer from the code which runs on the client.
For instance, mongodb has no schema, so you'd have to basically have a rule that defines which fields would be accepted and the format those fields must be in.
Take a look at Collection2. They support schema checking at run-time before inserting documents into the Collection.

Weld - Asynchronous event observers

I am using Weld to observe events. I thought there was a way to specify if the observer was asynchronous or not, but I am not finding that annotation or documentation.
Can observers be asynchronous, if so, what do I need to do to make that happen?
There is an open request for this: CDI-31: Asynchronous events.
Depending on your requirements, you can, as indicated in your comment, set a different transactional observer: If you use AFTER_COMPLETION or AFTER_SUCCESS, it should seem to your application like an asynchronous execution. However until a framework solves , I have just found an example using JMS for asynchronous execution in CDI.
Take a look at post on Piotr Nowicki's blog http://piotrnowicki.com/2013/05/asynchronous-cdi-events/
He described a couple of methods for achieving asynchronous behavior of CDI events.
If you guys want to see this happen, you'll need to head over the the link provided in Kariem's answer and voice your opinion. It seems the expert group is unwilling to consider adding async events because they consider it bloating the spec.
Honestly, Guice manages to offer this feature, and it remains lightweight, so I find the argument against this little counter-intuitive. Nevertheless, if you want to see this feature, head to the link, voice your opinion.
-Jonathan

API design: is "fault tolerance" a good thing?

I've consolidated many of the useful answers and came up with my own answer below
For example, I am writing a an API Foo which needs explicit initialization and termination. (Should be language agnostic but I'm using C++ here)
class Foo
{
public:
static void InitLibrary(int someMagicInputRequiredAtRuntime);
static void TermLibrary(int someOtherInput);
};
Apparently, our library doesn't care about multi-threading, reentrancy or whatnot. Let's suppose our Init function should only be called once, calling it again with any other input would wreak havoc.
What's the best way to communicate this to my caller? I can think of two ways:
Inside InitLibrary, I assert some static variable which will blame my caller for init'ing twice.
Inside InitLibrary, I check some static variable and silently aborts if my lib has already been initialized.
Method #1 obviously is explicit, while method #2 makes it more user friendly. I am thinking that method #2 probably has the disadvantage that my caller wouldn't be aware of the fact that InitLibrary shouln't be called twice.
What would be the pros/cons of each approach? Is there a cleverer way to subvert all these?
Edit
I know that the example here is very contrived. As #daemon pointed out, I should initialized myself and not bother the caller. Practically however, there are places where I need more information to properly initialize myself (note the use of my variable name someMagicInputRequiredAtRuntime). This is not restricted to initialization/termination but other instances where the dilemma exists whether I should choose to be quote-and-quote "fault tolorent" or fail lousily.
I would definitely go for approach 1, along with an easy-to-understand exception and good documentation that explains why this fails. This will force the caller to be aware that this can happen, and the calling class can easily wrap the call in a try-catch statement if needed.
Failing silently, on the other hand, will lead your users to believe that the second call was successful (no error message, no exception) and thus they will expect that the new values are set. So when they try to do something else with Foo, they don't get the expected results. And it's darn near impossible to figure out why if they don't have access to your source code.
Serenity Prayer (modified for interfaces)
SA, grant me the assertions
to accept the things devs cannot change
the code to except the things they can,
and the conditionals to detect the difference
If the fault is in the environment, then you should try and make your code deal with it. If it is something that the developer can prevent by fixing their code, it should generate an exception.
A good approach would be to have a factory that creates an intialized library object (this would require you to wrap your library in a class). Multiple create-calls to the factory would create different objects. This way, the initialize-method would then not be a part of the public interface of the library, and the factory would manage initialization.
If there can be only one instance of the library active, make the factory check for existing instances. This would effectively make your library-object a singleton.
I would suggest that you should flag an exception if your routine cannot achieve the expected post-condition. If someone calls your init routine twice, and the system state after calling it the second time will be the same would be the same as if it had just been called once, then it is probably not necessary to throw an exception. If the system state after the second call would not match the caller's expectation, then an exception should be thrown.
In general, I think it's more helpful to think in terms of state than in terms of action. To use an analogy, an attempt to open as "write new" a file that is already open should either fail or result in a close-erase-reopen. It should not simply perform a no-op, since the program will be expecting to be writing into an empty file whose creation time matches the current time. On the other hand, trying to close a file that's already closed should generally not be considered an error, because the desire is that the file be closed.
BTW, it's often helpful to have available a "Try" version of a method that might throw an exception. It would be nice, for example, to have a Control.TryBeginInvoke available for things like update routines (if a thread-safe control property changes, the property handler would like the control to be updated if it still exists, but won't really mind if the control gets disposed; it's a little irksome not being able to avoid a first-chance exception if a control gets closed when its property is being updated).
Have a private static counter variable in your class. If it is 0 then do the logic in Init and increment the counter, If it is more than 0 then simply increment the counter. In Term do the opposite, decrement until it is 0 then do the logic.
Another way is to use a Singleton pattern, here is a sample in C++.
I guess one way to subvert this dilemma is to fulfill both camps. Ruby has the -w warning switch, it is custom for gcc users to -Wall or even -Weffc++ and Perl has taint mode. By default, these "just work," but the more careful programmer can turn on these strict settings themselves.
One example against the "always complain the slightest error" approach is HTML. Imagine how frustrated the world would be if all browsers would bark at any CSS hacks (such as drawing elements at negative coordinates).
After considering many excellent answers, I've come to this conclusion for myself: When someone sits down, my API should ideally "just work." Of course, for anyone to be involved in any domain, he needs to work at one or two level of abstractions lower than the problem he is trying to solve, which means my user must learn about my internals sooner or later. If he uses my API for long enough, he will begin to stretch the limits and too much efforts to "hide" or "encapsulate" the inner workings will only become nuisance.
I guess fault tolerance is most of the time a good thing, it's just that it's difficult to get right when the API user is stretching corner cases. I could say the best of both worlds is to provide some kind of "strict mode" so that when things don't "just work," the user can easily dissect the problem.
Of course, doing this is a lot of extra work, so I may be just talking ideals here. Practically it all comes down to the specific case and the programmer's decision.
If your language doesn't allow this error to surface statically, chances are good the error will surface only at runtime. Depending on the use of your library, this means the error won't surface until much later in development. Possibly only when shipped (again, depends on alot).
If there's no danger in silently eating an error (which isn't a real error anyway, since you catch it before anything dangerous happens), then I'd say you should silently eat it. This makes it more user friendly.
If however someMagicInputRequiredAtRuntime varies from calling to calling, I'd raise the error whenever possible, or presumably the library will not function as expected ("I init'ed the lib with value 42, but it's behaving as if I initted with 11!?").
If this Library is a static class, (a library type with no state), why not put the call to Init in the type initializer? If it is an instantiatable type, then put the call in the constructor, or in the factory method that handles instantiation.
Don;t allow public access to the Init function at all.
I think your interface is a bit too technical. No programmer want to learn what concept you have used while designing the API. Programmers want solutions for their actual problems and don't want to learn how to use an API. Nobody wants to init your API, that is something that the API should handle in the background as far as possible. Find a good abstraction that shields the developer from as much low-level technical stuff as possible. That implies, that the API should be fault tolerant.

Resources