How can I stream arrays with Grpc? - grpc

I want to transport an array of int64.
I looked up how to use it. In my proto-file I either need a stream:
service myService {
rpc GetValues(myRequest) returns (stream myResponse);
}
message myRequest {
}
message myResponse {
int64 values = 1;
}
or a repeated response:
message myRepeatedResponse {
repeated int64 value = 1;
}
Is one option better than the other?
My use case is that I want to read the latest x entries from my Database and send these values as an array to my client.
But I didn't get how I am supposed to do it, because when assigning the values in the overridden function of MyService.MyService.Base I can only pass values of type 'long' and not 'long[]'.

For the stream vs repeated question, the answer is: it depends.
The distinction between the two is that:
streaming sends one or more messages (each message possibly containing repeated fields)
unary sends a single message containing a repeated field
So, I think your decision is based upon:
how the server obtains the repeated field.
the size of the message (including the repeated field that's being sent)
the 'integrity' of the message
If the server is unable to obtain the entirety of the repeated field in one go, then your answer is simpler; the server will need to stream the messages (including the repeated field) as it obtains them.
By 'integrity' of the message, is there some reason why decomposing the message into many (to stream) is problematic. If the repeated field must be transmitted as a single chunk, almost as a transactional unit, then I think you may prefer to not stream the message as chunks.
You should also consider the consequence on your client(s). Are your clients able to receive one larger message or, would many smaller messages be preferred, e.g. an IoT SoC device that's resource constrained.
Otherwise, if individual messages are large1, then you'd want to decompose them into smaller 'bites' and stream them.
1: Large Data Set and note that there is a hard limit of 2GB/message.

Related

How GRPC handle pointer that appear more then once?

For example (golang):
type {
Product struct {
Name string
}
Customer struct {
Name string
Products []*Product
}
}
Which is the correct behavior:
GRPC honor the *Product pointer and transfer it only once.
GRPC will transfer the same *Product as many times as it associated to different Customer.
Michael,
It is not clear on your message, but I am assuming that you will send a Customer as part of your request to a gRPC server.
Golang will marshal the struct into []byte (https://godoc.org/github.com/golang/protobuf/proto#Marshal), so the message will not have such thing as a pointer. It will be just an encoded message. (see
https://github.com/golang/protobuf/blob/master/proto/wire.go#L22).
gRPC is not a Golang thing, so a pointer on a side (e.g. server) does not mean it must be a point on the other side (e.g. client).
Finally, answering your question, the expected behavior is 2. However, you may take a deeper look into proto buff serialization (https://developers.google.com/protocol-buffers/docs/encoding). I have no idea how it works, but maybe the message is compressed, so repeated []bytes maybe be discarded.

Returning multiple items in gRPC: repeated List or stream single objects?

gRPC newbie. I have a simple api:
Customer getCustomer(int id)
List<Customer> getCustomers()
So my proto looks like this:
message ListCustomersResponse {
repeated Customer customer = 1;
}
rpc ListCustomers (google.protobuf.Empty) returns (ListCustomersResponse);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
I was trying to follow Googles lead on the style. Originally I had returns (stream Customer) for GetCustomers, but Google seems to favor the ListxxxResponse style. When I generate the code, it ends up being:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<ListCustomersResponse> responseObserver) {
vs:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<Customer> responseObserver) {
Am I missing something? Why would I want to go through the hassle of creating a ListCustomersResponse when I can just do stream Customer and get the streaming functionality?
The ListCustomersResponse is just streaming the whole list at once vs streaming each customer. Googles preference seems to be to return the ListCustomersResponse style all of the time.
When is it appropriate to use the ListxxxResponse vs the stream response?
This question is hard to answer without knowing what reference you're using. It's possible there's a miscommunication, or that the reference is simply wrong.
If you're looking at the gRPC Basics tutorial though, then I might have an inkling as to what caused a miscommunication. If that's indeed your reference, then it does not recommend returning repeated fields for streamed responses; your intuition is correct: you would just want to stream the singular Customer.
Here is what it says (screenshot intentional):
You might be reading rpc ListFeatures(Rectangle) as meaning an endpoint that returns a list [noun] of features. If so, that's a miscommunication. The guide actually means an endpoint to list [verb] features. It would have been less confusing if they just wrote rpc GetFeatures(Rectangle).
So, your proto should look more like this,
rpc GetCustomers (google.protobuf.Empty) returns (stream Customer);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
generating exactly what you suspected made more sense.
Update
Ah I see, so you're looking at this example in googleapis:
// Lists shelves. The order is unspecified but deterministic. Newly created
// shelves will not necessarily be added to the end of this list.
rpc ListShelves(ListShelvesRequest) returns (ListShelvesResponse) {
option (google.api.http) = {
get: "/v1/shelves"
};
}
...
// Response message for LibraryService.ListShelves.
message ListShelvesResponse {
// The list of shelves.
repeated Shelf shelves = 1;
// A token to retrieve next page of results.
// Pass this value in the
// [ListShelvesRequest.page_token][google.example.library.v1.ListShelvesRequest.page_token]
// field in the subsequent call to `ListShelves` method to retrieve the next
// page of results.
string next_page_token = 2;
}
Yeah, I think you've probably figured the same by now, but here they have chosen to use a simple RPC, as opposed to a server-side streaming RPC (see here). I emphasize this because, I think the important choice is not the stylistic difference between repeated versus stream, but rather the difference between a simple request-response API versus a more complex and less-ubiquitous streaming API.
In the googleapis example above, they're defining an API that returns a fixed and static number of items per page, e.g. 10 or 50. It would simply be overcomplicated to use streaming for this, when pagination is already so well-understood and prevalent in software architecture and REST APIs. I think that is what they should have said, rather than "a small number." So the complexity of streaming (and learning cost to you and future maintainers) has to justified, that's all. Suppose you're actually fetching thousands of (x, y, z) items for a Point Cloud or you're creating a live-updating bid-ask visualizer for some cryptocurrency, e.g.
Then you'd start asking yourself, "Is a simple request-response API my best option here?" So it just tends to be that, the larger the number of items needing to be returned, the more streaming APIs start to make sense. And that can be for conceptual reasons, e.g. the items are a live-updating stream in time like the above crypto example, or architectural, e.g. it would be more efficient to start displaying results in the UI as partial data streams back. I think the "small number" thing you read was an oversimplification.

Batching technique using Protobufs

Is there an efficient technique to batch different Protobuf events while sending over HTTP?
The goal is to have a list of multi-type Protobuf messages in one request. One idea I have is to separate messages in small arrays and specify their type to be able to deserialize them on the server.
You can use some Any message type combined with repeated, as follows:
message Any {
string type_url = 1;
bytes value = 2;
}
message Envelope {
repeated Any events = 1;
}
Then, in your code:
when serializing, you must set type_url according to the message type that you serialize in value
when deserializing, you must read type_url to know which type is contained in value, and deserialize accordingly
The example above reproduces the google/protobuf/any, that is documented here:
https://developers.google.com/protocol-buffers/docs/proto3#any

Corda oracles verification

I'm trying to understand how corda oracles work from an example on github. It seems like in every example oracle verification function checks the data in command and data in output state. I don't understand why that should work because we (issuer node) manage that data and put it in command/output state.
// Our contract does not check that the Nth prime is correct. Instead, it checks that the
// information in the command and state match.
override fun verify(tx: LedgerTransaction) = requireThat {
"There are no inputs" using (tx.inputs.isEmpty())
val output = tx.outputsOfType<PrimeState>().single()
val command = tx.commands.requireSingleCommand<Create>().value
"The prime in the output does not match the prime in the command." using
(command.n == output.n && command.nthPrime == output.nthPrime)
}
In this example state gets Nth prime number from oracle but after it's issued the verification function doesn't rerun generateNth prime function to make sure that this number is really the one we needed. I understand that data in this example is deterministic since Nth prime cannot change but what about the case where we have dynamic data like stock values? Shouldn't oracle verification function also send another http request and get current values to check them?
Firstly, note that contracts in Corda are not able to access the outside world in any way (DB reads, HTTP requests, etc.). If they could, transaction validity would be non-deterministic. A transaction that is found to be valid on day n may become invalid on day n+1 (because a database row changed, or a website went down, etc.). This would cause disagreements about whether a given transaction was a valid ledger update.
However, we sometimes need a transaction to include external data for verification (whether a company is bankrupt, whether a natural catastrophe happened, etc.). To do this, we use a trusted oracle that only signs the transaction if a given piece of data is valid.
We could embed the information in the input or output states. However, this would require us to reveal the entire input or output state to the oracle for signing. For privacy reasons, it is therefore preferable to embed the data in a command that only contains the data of interest to the oracle, so that we can filter out all the other parts of the transaction and only present this command to the oracle for signing.
The oracle will usually perform a DB read or make an HTTP request to check the validity of the data before signing.

Is there a way to send multiple transactions to counterparty without looping

Is there a way to send multiple transactions to a counterparty without using a loop in the flow? Sending one tx a time in a loop impacts the performance significantly since Suspendable behaviour doesn't work well with large volumn of txes.
At some point in time, T, an initiator may be interested in sending N numbers of transactions to a regulator/counterparty. But the current SendTransactionsFlow only send one tx at a time. And on the other side, it ReceiveTransactionFlow to record one by one.
My current code
relevantTxes.forEach{
subFlow(SendTransactionFlow(session, signedTx))
}
Is there a way to do something along the line of
subFlow(SendTransactionFlow(session, relevantTxes))
You can send the list of transactions without invoking a subflow by using send and receive.
On the sender's side:
val session = initiateFlow(otherParty)
session.send(relevantTxes)
On the receiver's side:
session.receive<List<SignedTransaction>>().unwrap { relevantTxes -> relevantTxes }

Resources